FIFA World Cup Predictor

Credits: Dan and Shiv


Background

The World Cup is the most prominent sporting event in the world, with over 1 billion people tuning in. The bare-bones nature of soccer allows for it to be easily understood and adopted, allows for it to be played by almost anyone (regardless of socioeconomic background), and it’s exciting atmosphere allows for it to be enjoyed by all. The World Cup has become a worldwide cultural phenomenon and the stakes to win are high; the winning nation receives a great deal of international fame and prestige. Recently, the World Cup has been growing in international recognition and viewership, and even though the World Cup has garnered such a huge viewer-base, the state of analytics on the World Cup (and soccer, in general) matches is archaic, relative to other sports such as Basketball and American Football. Soccer and World Cup analytics are on the rise, however, within the scope of this project, we plan on making a prediction model to contribute to these analytics even further.

Motivation

Shiv is an avid soccer player and Daniel, although he doesn’t play soccer, follows professional and college soccer enthusiastically. We would love to delve into the world of sports analytics, and become more informed on how professional statisticians ‘conjure-up’ their predictions, while also learning why many of their guesses fail. We are excited to dig-into the many grey-area factors that are often overlooked by common World Cup match predictions and (hopefully) create a great model that creates accurate predictions and allows us to impress our friends at future World Cup viewing parties.


Problem Statement

  1. Preparation and Cleaning of Data: Construct a comprehensive dataset with pre-processed features/predictors scraped from our data sources.
  2. Construction of Model (1): Construct a prediction model of the 2018 World Cup that that can accurately create a complete framework of national teams, from a variety of sources.
  3. Construction of Model (2): Construct a prediction model of the 2018 World Cup from scratch (without relying on FIFA rankings), but instead with added features.
  4. Comparing Results: Compare the results from our models and determine our best model by testing against the 2018 FIFA World Cup results.

Question that we will address:

(from the 2018 FIFA World Cup)

  1. “Can we create a comprehensive and interpretable framework to both accurately represent team strength in the World Cup and appropriately weigh any factors that may impact team strength and contribute to the result?”

Citation: Creative Commons