• Student Group: Northwestern University Master of Science in Analytics (MSiA)
  • Team Members: Vincent(Developer), Rush(Project Manager), Lauren(QA)
  • Project Repo

Image of FIFA18 Web App

1. Business Problem

  • Implicit to rate player overall performance
  • Teams suffer losses when underestimate player value
  • Negotiation is unnecessarily long to determine player value

2. Purpose and Objectives

Predict soccer player’s value by using predictive models based on the features including base stats, skills, preferred position, and club, etc to assist club manager to gain negotiation power in the transfer market, offer wise transfer fee and wage, and ultimately benefit the club from the deal.

  • Vision: Assist club manager to gain negotiation power in the transfer market, offer wise transfer fee and wage, and ultimately benefit the club from the deal.
  • Mission: Predict player’s value, wage, and overall rating by using predictive models based on the features including base stats, skills, preferred position, and club, etc.
  • Success criteria: A R^2 of greater than 80% to measure the performance of the transfer market value prediction model and demonstrate the effectiveness of the web application.

If you want to know more, this is a list of selected starting points:

3. Business Impact

  • Create latent attributes to evaluate players
  • Model to predict player transfer value
  • Assist club manager to gain negotiation power
  • Offer wise transfer fee

Image of FIFA18_1

4. Full Stack Development

  1. Develop the web app using Flask framework in Python.
  2. Front End: Design web app website using Boostrap framework (HTML, CSS, Javascript)
  3. Controller: Flask
  4. Back End:
    • AWS RDS database (Mysql) —- to store user input and prediction result
    • Using SK learn package in Python to create, test model, and pickle model
  5. Deployment
    • AWS Elastic Beanstalk
    • EC2
  6. Testing and reproducibility
    • Unit test
    • Logging
    • Yaml File
    • Make File
  7. Document
    • Sphinx to auto document the code
    • Github repo using cookie cutter data science structure
    • README.md to instruct developer set up configuration
  8. Project Management
    • Experience three key roles in data science project: Project Manager, Developer, and QA.
    • Using Pivotal Tracker to do project management

5. Dataset

  1. Player Attribute Data
    • Ball control
    • Crossing
    • etc
  2. Player Personal Data
    • Age
    • Potential
    • etc
  3. Player Playing Position Data
    • Preferred Position
    • etc

6. Variables after selection

  1. numeric:
    • continuous: wage, Overall_Rating, Potential, Composure, Marking, Reactions, Vision, Volleys
    • discrete: Age
  2. categorical:
    • ordinal: Num_Positions
    • nomial: Position
    • binary:

7. Imputation

  1. Standardize the wage and value unit (some are measured by k, some are m)
  2. KNN to impute the score
    • Separate attribute into attacking and defending attributes
    • Impute the data based on other attributes in the group.
    • For example, impute the volleys using the closest players’ data where the distance was calculated by attacking attributes.

8. Feature Engineering

  1. Create player continent attributes mapped by player’s country
  2. Created league label mapped by player’s team (Series A, etc)
  3. Select the primary position as player preferred position
  4. Calculate the # position to show player flexibility.

9. Modeling

  1. Linear Regression
  2. Ridge Regression
  3. Lasso Regression
  4. Random Forest
  5. Neural Network

10. Evaluation

After comparing cross validation R^2, we selected Random Forest.

Image of FIFA18_1

10. Findings and Insights

  1. Potential and Age are the most influential attributes
  2. Attacking attributes are more important than defense attributes
  3. Position flexibility is not important
  4. Position “Discrimination”
  5. Overall rating and potential are closer with the increase of age