Reputation: 87
I have a list of NBA player scores that pans across several days. My goal is to determine which players score well together on the same day.
My data set contains the date, player name, team and points scored as columns:
Date Team Name Points
2020-12-22 LAL Dennis Schroder 43
2020-12-22 LAL LeBron James 35
2020-12-22 LAL Kyle Kuzma 15.75
2020-12-23 LAL Dennis Schroder 22
2020-12-23 LAL LeBron James 23.25
2020-12-23 LAL Kyle Kuzma 39.75
2020-12-24 LAL Dennis Schroder 40
2020-12-24 LAL LeBron James 55.25
2020-12-24 LAL Kyle Kuzma 7
Ideally I will be able to filter down to one team, and run something like df.T.corr()
to get a summarized list of player names into a matrix against the other players on that same team.
import pandas as pd
df = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSqawsLtGqzIoptqIXY8MLF0TlLtMSoiXuE2EM3HgiAXrbXCnYTSSfI5pF0KYuzH_lYKU00dU6ED_76/pub?gid=0&single=true&output=csv")
playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']
playerdf.corr() #only correlates the columns to each other
playerdf.T.corr() #returns an empty dataframe
In my example, it would seem that a correlation matrix would show a positive correlation between Lebron and Dennis, and a negative correlation to Kyle from both players.
Upvotes: 0
Views: 237
Reputation: 28565
Correlations only work with numerical variables. When you are looking at correlations, you are essentially asking, "as x increases/decreases, does y increase/decrease?"
Your question is correct in the sense of, "As Lebron James' scoring increase/decreases, player B's score increases/decreases." BUT your data is not set up to do that.
playerdf.T
Out[66]:
2 4 ... 409 423
Name Dennis Schroder LeBron James ... Markieff Morris Marc Gasol
Date 2020-12-22 2020-12-22 ... 2020-12-25 2020-12-25
Points 43 35.25 ... 24.25 12.75
Team LAL LAL ... LAL LAL
[4 rows x 26 columns]
I'm curious as to how they score fractions of a point???
We need to pivot so that each instance/row is the date/game, and the columns are the players names with the values being the points scored. Once you do that, you can throw it into the .corr()
method.
With that, you're not going to see much with just 2 games/dates of data:
import pandas as pd
file = '"https://docs.google.com/spreadsheets/d/e/2PACX-1vRlZiz12o4zOCRrjuTgBFlUwRjWKz2v2o4-B8dZ6C-kHwkmI5wRWMO4vS9u2bRVtCy9UJkwPXp-BKCw/pub?gid=0&single=true&output=csv"'
df = pd.read_csv(file)
playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']
playerdf = playerdf.pivot(index='Date',
columns='Name',
values='Points').fillna(0)
corr = playerdf.corr()
Output:
print (corr.to_string())
Name Alex Caruso Alfonzo McKinnie Anthony Davis Dennis Schroder Jared Dudley Kentavious Caldwell-Pope Kyle Kuzma LeBron James Marc Gasol Markieff Morris Montrezl Harrell Quinn Cook Talen Horton-Tucker Wes Matthews
Name
Alex Caruso 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Alfonzo McKinnie 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Anthony Davis 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Dennis Schroder -1.0 -1.0 -1.0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 NaN -1.0 -1.0
Jared Dudley 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Kentavious Caldwell-Pope -1.0 -1.0 -1.0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 NaN -1.0 -1.0
Kyle Kuzma 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
LeBron James 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Marc Gasol 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Markieff Morris 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Montrezl Harrell -1.0 -1.0 -1.0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 -1.0 1.0 NaN -1.0 -1.0
Quinn Cook NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Talen Horton-Tucker 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
Wes Matthews 1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 1.0 1.0 1.0 -1.0 NaN 1.0 1.0
If I go back and get a full seasons worth:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.basketball-reference.com/teams/LAL/2019_games.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
links = table.find_all('a', href=True)
boxscore_links = []
for link in links:
if 'boxscores' in link['href'] and '.html' in link['href']:
boxscore_links.append('https://www.basketball-reference.com' + link['href'])
playerdf = pd.DataFrame()
for link in boxscore_links:
print (link)
temp_df = pd.read_html(link, header=1,attrs={'id':'box-LAL-game-basic'})[0]
temp_df = temp_df[['Starters', 'PTS']]
temp_df = temp_df[temp_df['Starters'] != 'Team Totals']
temp_df = temp_df[temp_df['Starters'] != 'Reserves']
temp_df['PTS'] = temp_df['PTS'].replace('Did Not Play', 0)
temp_df['PTS'] = temp_df['PTS'].replace('Did Not Dress', 0)
temp_df['PTS'] = temp_df['PTS'].replace('Not With Team', 0)
temp_df['PTS'] = temp_df['PTS'].astype(int)
temp_df['Date'] = re.findall("\d+", link.split('/')[-1].split('.html')[0])[0]
temp_df = temp_df.rename(columns={'Starters':'Name', 'PTS':'Points'})
playerdf = playerdf.append(temp_df, sort=False).reset_index(drop=True)
playerdf = playerdf.pivot(index='Date',
columns='Name',
values='Points').fillna(0)
corr = playerdf.corr()
Then you might find some correlations:
Output:
print (corr.to_string())
Name Alex Caruso Andre Ingram Brandon Ingram Isaac Bonga Ivica Zubac JaVale McGee Jemerrio Jones Johnathan Williams Josh Hart Kentavious Caldwell-Pope Kyle Kuzma Lance Stephenson LeBron James Lonzo Ball Michael Beasley Mike Muscala Moritz Wagner Rajon Rondo Reggie Bullock Scott Machado Sviatoslav Mykhailiuk Tyson Chandler
Name
Alex Caruso 1.000000 NaN -0.502772 0.356931 -0.223081 0.360708 0.520267 0.635980 -0.377755 0.331362 -0.427086 -0.279960 -0.258477 -0.395673 -0.190208 0.614652 0.462480 0.282011 0.295477 0.180002 -0.240216 -0.272816
Andre Ingram NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Brandon Ingram -0.502772 NaN 1.000000 -0.311075 0.280328 -0.212760 -0.252852 -0.502750 0.064457 -0.330685 0.015547 -0.034681 -0.116722 0.068030 0.256519 -0.273952 -0.423331 -0.075037 -0.010224 -0.167714 -0.029635 0.142737
Isaac Bonga 0.356931 NaN -0.311075 1.000000 -0.014284 0.052887 0.212814 0.317496 -0.170178 0.018247 -0.210940 0.033076 -0.215860 -0.107862 -0.046352 0.249809 0.506899 0.069940 -0.003765 0.237553 0.191829 -0.104224
Ivica Zubac -0.223081 NaN 0.280328 -0.014284 1.000000 -0.348919 -0.125094 -0.255467 0.097697 0.003421 0.032512 0.154095 -0.462171 0.142622 0.449249 -0.204575 -0.046258 -0.060691 -0.268645 -0.082973 0.308421 0.115336
JaVale McGee 0.360708 NaN -0.212760 0.052887 -0.348919 1.000000 0.131512 0.203464 -0.195306 0.088362 -0.161654 0.007220 0.071916 -0.250259 -0.189589 0.220799 0.025695 0.074450 0.051457 0.142273 -0.038746 -0.271256
Jemerrio Jones 0.520267 NaN -0.252852 0.212814 -0.125094 0.131512 1.000000 0.544439 -0.246812 0.401716 -0.362906 -0.201776 -0.287865 -0.191340 -0.111905 0.805160 0.250571 0.039685 -0.040080 -0.032381 -0.126897 -0.151910
Johnathan Williams 0.635980 NaN -0.502750 0.317496 -0.255467 0.203464 0.544439 1.000000 -0.223735 0.216588 -0.335991 -0.076575 -0.112725 -0.280153 -0.212707 0.530976 0.638914 0.057808 0.074619 0.179093 -0.220783 -0.310233
Josh Hart -0.377755 NaN 0.064457 -0.170178 0.097697 -0.195306 -0.246812 -0.223735 1.000000 -0.202327 0.112090 0.106432 0.062429 0.359006 0.053293 -0.312218 -0.323296 -0.165224 -0.300856 -0.163708 0.190857 0.196536
Kentavious Caldwell-Pope 0.331362 NaN -0.330685 0.018247 0.003421 0.088362 0.401716 0.216588 -0.202327 1.000000 -0.254029 -0.053019 -0.329252 -0.151266 -0.087638 0.381221 0.187377 0.011464 0.038160 0.039444 0.037875 0.050367
Kyle Kuzma -0.427086 NaN 0.015547 -0.210940 0.032512 -0.161654 -0.362906 -0.335991 0.112090 -0.254029 1.000000 0.039111 0.187677 0.355282 0.081492 -0.370250 -0.338748 -0.254589 -0.105824 0.049026 0.018252 0.141192
Lance Stephenson -0.279960 NaN -0.034681 0.033076 0.154095 0.007220 -0.201776 -0.076575 0.106432 -0.053019 0.039111 1.000000 -0.048462 0.085465 0.009354 -0.265252 -0.066810 -0.071756 -0.357791 0.079382 0.264893 0.044603
LeBron James -0.258477 NaN -0.116722 -0.215860 -0.462171 0.071916 -0.287865 -0.112725 0.062429 -0.329252 0.187677 -0.048462 1.000000 -0.021212 -0.417934 -0.336107 -0.227264 0.032238 0.098842 -0.119156 -0.177819 -0.099600
Lonzo Ball -0.395673 NaN 0.068030 -0.107862 0.142622 -0.250259 -0.191340 -0.280153 0.359006 -0.151266 0.355282 0.085465 -0.021212 1.000000 0.078883 -0.312913 -0.298580 -0.442047 -0.410911 -0.126914 0.211892 0.520982
Michael Beasley -0.190208 NaN 0.256519 -0.046352 0.449249 -0.189589 -0.111905 -0.212707 0.053293 -0.087638 0.081492 0.009354 -0.417934 0.078883 1.000000 -0.183008 0.025792 -0.254584 -0.240322 -0.074226 0.167759 0.073540
Mike Muscala 0.614652 NaN -0.273952 0.249809 -0.204575 0.220799 0.805160 0.530976 -0.312218 0.381221 -0.370250 -0.265252 -0.336107 -0.312913 -0.183008 1.000000 0.306389 0.203155 0.207427 -0.052954 -0.207525 -0.248431
Moritz Wagner 0.462480 NaN -0.423331 0.506899 -0.046258 0.025695 0.250571 0.638914 -0.323296 0.187377 -0.338748 -0.066810 -0.227264 -0.298580 0.025792 0.306389 1.000000 0.016732 0.147417 0.341310 -0.074224 -0.206353
Rajon Rondo 0.282011 NaN -0.075037 0.069940 -0.060691 0.074450 0.039685 0.057808 -0.165224 0.011464 -0.254589 -0.071756 0.032238 -0.442047 -0.254584 0.203155 0.016732 1.000000 0.378034 -0.021978 -0.267364 -0.450237
Reggie Bullock 0.295477 NaN -0.010224 -0.003765 -0.268645 0.051457 -0.040080 0.074619 -0.300856 0.038160 -0.105824 -0.357791 0.098842 -0.410911 -0.240322 0.207427 0.147417 0.378034 1.000000 -0.069539 -0.272518 -0.296419
Scott Machado 0.180002 NaN -0.167714 0.237553 -0.082973 0.142273 -0.032381 0.179093 -0.163708 0.039444 0.049026 0.079382 -0.119156 -0.126914 -0.074226 -0.052954 0.341310 -0.021978 -0.069539 1.000000 -0.084170 -0.100761
Sviatoslav Mykhailiuk -0.240216 NaN -0.029635 0.191829 0.308421 -0.038746 -0.126897 -0.220783 0.190857 0.037875 0.018252 0.264893 -0.177819 0.211892 0.167759 -0.207525 -0.074224 -0.267364 -0.272518 -0.084170 1.000000 0.255530
Tyson Chandler -0.272816 NaN 0.142737 -0.104224 0.115336 -0.271256 -0.151910 -0.310233 0.196536 0.050367 0.141192 0.044603 -0.099600 0.520982 0.073540 -0.248431 -0.206353 -0.450237 -0.296419 -0.100761 0.255530 1.000000
Heatmap:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
square=True, ax=ax)
Upvotes: 1