Sedric Hibler
Sedric Hibler

Reputation: 87

Finding correlating scores within a team in a Pandas dataframe

I have a list of NBA player scores that pans across several days. My goal is to determine which players score well together on the same day.

My data set contains the date, player name, team and points scored as columns:

Date    Team    Name    Points
2020-12-22  LAL Dennis Schroder 43
2020-12-22  LAL LeBron James    35
2020-12-22  LAL Kyle Kuzma  15.75
2020-12-23  LAL Dennis Schroder 22
2020-12-23  LAL LeBron James    23.25
2020-12-23  LAL Kyle Kuzma  39.75
2020-12-24  LAL Dennis Schroder 40
2020-12-24  LAL LeBron James    55.25
2020-12-24  LAL Kyle Kuzma  7

Link: https://docs.google.com/spreadsheets/d/e/2PACX-1vSqawsLtGqzIoptqIXY8MLF0TlLtMSoiXuE2EM3HgiAXrbXCnYTSSfI5pF0KYuzH_lYKU00dU6ED_76/pub?gid=0&single=true&output=csv

Ideally I will be able to filter down to one team, and run something like df.T.corr() to get a summarized list of player names into a matrix against the other players on that same team.

import pandas as pd
df = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSqawsLtGqzIoptqIXY8MLF0TlLtMSoiXuE2EM3HgiAXrbXCnYTSSfI5pF0KYuzH_lYKU00dU6ED_76/pub?gid=0&single=true&output=csv")
playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']
playerdf.corr()     #only correlates the columns to each other 
playerdf.T.corr()   #returns an empty dataframe

In my example, it would seem that a correlation matrix would show a positive correlation between Lebron and Dennis, and a negative correlation to Kyle from both players.

Upvotes: 0

Views: 237

Answers (1)

chitown88
chitown88

Reputation: 28565

Correlations only work with numerical variables. When you are looking at correlations, you are essentially asking, "as x increases/decreases, does y increase/decrease?"

Your question is correct in the sense of, "As Lebron James' scoring increase/decreases, player B's score increases/decreases." BUT your data is not set up to do that.

playerdf.T
Out[66]: 
                    2             4    ...              409         423
Name    Dennis Schroder  LeBron James  ...  Markieff Morris  Marc Gasol
Date         2020-12-22    2020-12-22  ...       2020-12-25  2020-12-25
Points               43         35.25  ...            24.25       12.75
Team                LAL           LAL  ...              LAL         LAL

[4 rows x 26 columns]

I'm curious as to how they score fractions of a point???

We need to pivot so that each instance/row is the date/game, and the columns are the players names with the values being the points scored. Once you do that, you can throw it into the .corr() method.

With that, you're not going to see much with just 2 games/dates of data:

import pandas as pd
file = '"https://docs.google.com/spreadsheets/d/e/2PACX-1vRlZiz12o4zOCRrjuTgBFlUwRjWKz2v2o4-B8dZ6C-kHwkmI5wRWMO4vS9u2bRVtCy9UJkwPXp-BKCw/pub?gid=0&single=true&output=csv"'

df = pd.read_csv(file)

playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']

playerdf = playerdf.pivot(index='Date',
                              columns='Name',
                              values='Points').fillna(0)

corr = playerdf.corr()

Output:

print (corr.to_string())
Name                      Alex Caruso  Alfonzo McKinnie  Anthony Davis  Dennis Schroder  Jared Dudley  Kentavious Caldwell-Pope  Kyle Kuzma  LeBron James  Marc Gasol  Markieff Morris  Montrezl Harrell  Quinn Cook  Talen Horton-Tucker  Wes Matthews
Name                                                                                                                                                                                                                                                   
Alex Caruso                       1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Alfonzo McKinnie                  1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Anthony Davis                     1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Dennis Schroder                  -1.0              -1.0           -1.0              1.0          -1.0                       1.0        -1.0          -1.0        -1.0             -1.0               1.0         NaN                 -1.0          -1.0
Jared Dudley                      1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Kentavious Caldwell-Pope         -1.0              -1.0           -1.0              1.0          -1.0                       1.0        -1.0          -1.0        -1.0             -1.0               1.0         NaN                 -1.0          -1.0
Kyle Kuzma                        1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
LeBron James                      1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Marc Gasol                        1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Markieff Morris                   1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Montrezl Harrell                 -1.0              -1.0           -1.0              1.0          -1.0                       1.0        -1.0          -1.0        -1.0             -1.0               1.0         NaN                 -1.0          -1.0
Quinn Cook                        NaN               NaN            NaN              NaN           NaN                       NaN         NaN           NaN         NaN              NaN               NaN         NaN                  NaN           NaN
Talen Horton-Tucker               1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Wes Matthews                      1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0

If I go back and get a full seasons worth:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.basketball-reference.com/teams/LAL/2019_games.html'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
links = table.find_all('a', href=True)

boxscore_links = []
for link in links:
    if 'boxscores' in link['href'] and '.html' in link['href']:
        boxscore_links.append('https://www.basketball-reference.com' + link['href'])
        
playerdf = pd.DataFrame()
for link in boxscore_links:
    print (link)
    temp_df = pd.read_html(link, header=1,attrs={'id':'box-LAL-game-basic'})[0]
    temp_df = temp_df[['Starters', 'PTS']]
    temp_df = temp_df[temp_df['Starters'] != 'Team Totals']
    temp_df = temp_df[temp_df['Starters'] != 'Reserves']
    temp_df['PTS'] = temp_df['PTS'].replace('Did Not Play', 0)
    temp_df['PTS'] = temp_df['PTS'].replace('Did Not Dress', 0)
    temp_df['PTS'] = temp_df['PTS'].replace('Not With Team', 0)
    temp_df['PTS'] = temp_df['PTS'].astype(int)
    temp_df['Date'] = re.findall("\d+", link.split('/')[-1].split('.html')[0])[0]    
    temp_df = temp_df.rename(columns={'Starters':'Name', 'PTS':'Points'})   
    
    playerdf = playerdf.append(temp_df, sort=False).reset_index(drop=True)

playerdf = playerdf.pivot(index='Date',
                              columns='Name',
                              values='Points').fillna(0)

corr = playerdf.corr()

Then you might find some correlations:

Output:

print (corr.to_string())
Name                      Alex Caruso  Andre Ingram  Brandon Ingram  Isaac Bonga  Ivica Zubac  JaVale McGee  Jemerrio Jones  Johnathan Williams  Josh Hart  Kentavious Caldwell-Pope  Kyle Kuzma  Lance Stephenson  LeBron James  Lonzo Ball  Michael Beasley  Mike Muscala  Moritz Wagner  Rajon Rondo  Reggie Bullock  Scott Machado  Sviatoslav Mykhailiuk  Tyson Chandler
Name                                                                                                                                                                                                                                                                                                                                                                         
Alex Caruso                  1.000000           NaN       -0.502772     0.356931    -0.223081      0.360708        0.520267            0.635980  -0.377755                  0.331362   -0.427086         -0.279960     -0.258477   -0.395673        -0.190208      0.614652       0.462480     0.282011        0.295477       0.180002              -0.240216       -0.272816
Andre Ingram                      NaN           NaN             NaN          NaN          NaN           NaN             NaN                 NaN        NaN                       NaN         NaN               NaN           NaN         NaN              NaN           NaN            NaN          NaN             NaN            NaN                    NaN             NaN
Brandon Ingram              -0.502772           NaN        1.000000    -0.311075     0.280328     -0.212760       -0.252852           -0.502750   0.064457                 -0.330685    0.015547         -0.034681     -0.116722    0.068030         0.256519     -0.273952      -0.423331    -0.075037       -0.010224      -0.167714              -0.029635        0.142737
Isaac Bonga                  0.356931           NaN       -0.311075     1.000000    -0.014284      0.052887        0.212814            0.317496  -0.170178                  0.018247   -0.210940          0.033076     -0.215860   -0.107862        -0.046352      0.249809       0.506899     0.069940       -0.003765       0.237553               0.191829       -0.104224
Ivica Zubac                 -0.223081           NaN        0.280328    -0.014284     1.000000     -0.348919       -0.125094           -0.255467   0.097697                  0.003421    0.032512          0.154095     -0.462171    0.142622         0.449249     -0.204575      -0.046258    -0.060691       -0.268645      -0.082973               0.308421        0.115336
JaVale McGee                 0.360708           NaN       -0.212760     0.052887    -0.348919      1.000000        0.131512            0.203464  -0.195306                  0.088362   -0.161654          0.007220      0.071916   -0.250259        -0.189589      0.220799       0.025695     0.074450        0.051457       0.142273              -0.038746       -0.271256
Jemerrio Jones               0.520267           NaN       -0.252852     0.212814    -0.125094      0.131512        1.000000            0.544439  -0.246812                  0.401716   -0.362906         -0.201776     -0.287865   -0.191340        -0.111905      0.805160       0.250571     0.039685       -0.040080      -0.032381              -0.126897       -0.151910
Johnathan Williams           0.635980           NaN       -0.502750     0.317496    -0.255467      0.203464        0.544439            1.000000  -0.223735                  0.216588   -0.335991         -0.076575     -0.112725   -0.280153        -0.212707      0.530976       0.638914     0.057808        0.074619       0.179093              -0.220783       -0.310233
Josh Hart                   -0.377755           NaN        0.064457    -0.170178     0.097697     -0.195306       -0.246812           -0.223735   1.000000                 -0.202327    0.112090          0.106432      0.062429    0.359006         0.053293     -0.312218      -0.323296    -0.165224       -0.300856      -0.163708               0.190857        0.196536
Kentavious Caldwell-Pope     0.331362           NaN       -0.330685     0.018247     0.003421      0.088362        0.401716            0.216588  -0.202327                  1.000000   -0.254029         -0.053019     -0.329252   -0.151266        -0.087638      0.381221       0.187377     0.011464        0.038160       0.039444               0.037875        0.050367
Kyle Kuzma                  -0.427086           NaN        0.015547    -0.210940     0.032512     -0.161654       -0.362906           -0.335991   0.112090                 -0.254029    1.000000          0.039111      0.187677    0.355282         0.081492     -0.370250      -0.338748    -0.254589       -0.105824       0.049026               0.018252        0.141192
Lance Stephenson            -0.279960           NaN       -0.034681     0.033076     0.154095      0.007220       -0.201776           -0.076575   0.106432                 -0.053019    0.039111          1.000000     -0.048462    0.085465         0.009354     -0.265252      -0.066810    -0.071756       -0.357791       0.079382               0.264893        0.044603
LeBron James                -0.258477           NaN       -0.116722    -0.215860    -0.462171      0.071916       -0.287865           -0.112725   0.062429                 -0.329252    0.187677         -0.048462      1.000000   -0.021212        -0.417934     -0.336107      -0.227264     0.032238        0.098842      -0.119156              -0.177819       -0.099600
Lonzo Ball                  -0.395673           NaN        0.068030    -0.107862     0.142622     -0.250259       -0.191340           -0.280153   0.359006                 -0.151266    0.355282          0.085465     -0.021212    1.000000         0.078883     -0.312913      -0.298580    -0.442047       -0.410911      -0.126914               0.211892        0.520982
Michael Beasley             -0.190208           NaN        0.256519    -0.046352     0.449249     -0.189589       -0.111905           -0.212707   0.053293                 -0.087638    0.081492          0.009354     -0.417934    0.078883         1.000000     -0.183008       0.025792    -0.254584       -0.240322      -0.074226               0.167759        0.073540
Mike Muscala                 0.614652           NaN       -0.273952     0.249809    -0.204575      0.220799        0.805160            0.530976  -0.312218                  0.381221   -0.370250         -0.265252     -0.336107   -0.312913        -0.183008      1.000000       0.306389     0.203155        0.207427      -0.052954              -0.207525       -0.248431
Moritz Wagner                0.462480           NaN       -0.423331     0.506899    -0.046258      0.025695        0.250571            0.638914  -0.323296                  0.187377   -0.338748         -0.066810     -0.227264   -0.298580         0.025792      0.306389       1.000000     0.016732        0.147417       0.341310              -0.074224       -0.206353
Rajon Rondo                  0.282011           NaN       -0.075037     0.069940    -0.060691      0.074450        0.039685            0.057808  -0.165224                  0.011464   -0.254589         -0.071756      0.032238   -0.442047        -0.254584      0.203155       0.016732     1.000000        0.378034      -0.021978              -0.267364       -0.450237
Reggie Bullock               0.295477           NaN       -0.010224    -0.003765    -0.268645      0.051457       -0.040080            0.074619  -0.300856                  0.038160   -0.105824         -0.357791      0.098842   -0.410911        -0.240322      0.207427       0.147417     0.378034        1.000000      -0.069539              -0.272518       -0.296419
Scott Machado                0.180002           NaN       -0.167714     0.237553    -0.082973      0.142273       -0.032381            0.179093  -0.163708                  0.039444    0.049026          0.079382     -0.119156   -0.126914        -0.074226     -0.052954       0.341310    -0.021978       -0.069539       1.000000              -0.084170       -0.100761
Sviatoslav Mykhailiuk       -0.240216           NaN       -0.029635     0.191829     0.308421     -0.038746       -0.126897           -0.220783   0.190857                  0.037875    0.018252          0.264893     -0.177819    0.211892         0.167759     -0.207525      -0.074224    -0.267364       -0.272518      -0.084170               1.000000        0.255530
Tyson Chandler              -0.272816           NaN        0.142737    -0.104224     0.115336     -0.271256       -0.151910           -0.310233   0.196536                  0.050367    0.141192          0.044603     -0.099600    0.520982         0.073540     -0.248431      -0.206353    -0.450237       -0.296419      -0.100761               0.255530        1.000000

Heatmap:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
            square=True, ax=ax)

enter image description here

Upvotes: 1

Related Questions