Reputation: 491
Fair warning this question does require a non standard Python package, nba_api
. I have a list with 3 elements with each element in the list containing another list with 2 elements: a player
data frame and a team
data frame. What is recommended way to achieve the following desired result: 1 combined player
data frame and 1 combined team
data frame? Coming from an R background, I would tackle this problem by: 1. joining the players
data frame with the team
data frame into joined_list
then, 2. using do.call(rbind, joined_list)
to row bind the results into one data frame. I understand this might be very elementary to a lot of experienced Python users but I'm having a hell of a time trying to find the right approach to this after many searches on here.
import nba_api
import requests
import pandas as pd
from nba_api.stats.endpoints import boxscoreadvancedv2
# vector of game ids (test purposes)
gameids = ['0021900001','0021900002','0021900012']
headers1 = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
# store player and team results for each gameids as elements of list temp
temp = list()
for i in range(len(gameids)):
temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))
# manually access elements of list and output to data frame
## there has to be an easier way to access list elements and rowbind the results!!!
df_out0 = temp[0].get_data_frames()
df_player0 = df_out0[0]
df_team0 = df_out0[1]
df_out1 = temp[1].get_data_frames()
df_player1 = df_out1[0]
df_team1 = df_out1[1]
Upvotes: 2
Views: 502
Reputation: 2702
First of all, congratulations on sticking it through and finding a solution on your own! :D
lst_1 = [1, 2, 3, 4]
for i in range(len(lst_1)):
print(i)
can be written as
lst_1 = [1, 2, 3, 4]
for item in lst_1:
print(item)
Bonus: Notice the changes I make to variable names. See PEP 8 for a general reference on Python style.
gameids = ['0021900001','0021900002','0021900012']
headers1 = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
# store player and team results for each gameids as elements of list temp
temp = list()
for i in range(len(gameids)):
temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))
can be written as
game_ids = ['0021900001','0021900002','0021900012']
api_headers = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
api_results = [boxscoreadvancedv2.BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids]
# output player frames
i=0
df_out=[]
df_players=[]
for i in range(len(temp)):
df_out = temp[i].get_data_frames()
df_players.append(df_out[0]) # index 0 will always contain player frame
df_players = pd.concat(df_players)
print(df_players)
# output team frames
i=0
df_out=[]
df_team=[]
for i in range(len(temp)):
df_out = temp[i].get_data_frames()
df_team.append(df_out[1]) # index 1 will always contain team frame
df_team = pd.concat(df_team)
print(df_team)
Using the first two tips, here is what we end up with:
players_lst = []
team_lst = []
for curr_res in api_results:
curr_dfs = curr_res.get_data_frames()
players_lst.append(curr_dfs[0])
team_lst.append(curr_dfs[1])
players_df = pd.concat(players_lst)
team_df = pd.concat(team_lst)
Here it is, broken down slightly for the sake of clarity.
import pandas as pd
from nba_api.stats.endpoints.boxscoreadvancedv2 import BoxScoreAdvancedV2
game_ids = ['0021900001', '0021900002', '0021900012']
api_headers = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
# generator of results from the API
api_results = (BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids)
# generator of lists of DataFrames from the API results
# think of it like: [[Player DF, Team DF], [Player DF, Team DF], ...]
api_res_dfs = (curr_res.get_data_frames() for curr_res in api_results)
# unpacking the size 2 lists of DataFrames into 2 flat lists
# [[Player DF, Team DF], [Player DF, Team DF], ...] -> [Player DF, Player DF, ...], [Team DF, Team DF, ...]
# see https://stackoverflow.com/q/2921847/11301900 for more on the use of the asterisk (*)
players_tupe, team_tupe = zip(*api_res_dfs)
# concatenating the various DataFrames, exactly the same as in your original code
players_df = pd.concat(players_tupe)
team_df = pd.concat(team_tupe)
print(players_df)
print(team_df)
It hinges on the fact that not only, as you pointed out, the player DataFrame is always first in the list and the team DataFrame is always second, but that those are the only two items in the list of results.
Let me know if you have any questions :)
Upvotes: 1
Reputation: 491
After a bit more reading (and clarity) I was able to combine the manual parts of my code in for loops that generate one list with player data and one list with team data. Then, using this post: Concatenate a list of pandas dataframes together I was able to combine the player
and team
lists into respective data frames.
## output player frames
i=0
df_out=[]
df_players=[]
for i in range(len(temp)):
df_out = temp[i].get_data_frames()
df_players.append(df_out[0]) # index 0 will always contain player frame
df_players = pd.concat(df_players)
print(df_players)
## output team frames
i=0
df_out=[]
df_team=[]
for i in range(len(temp)):
df_out = temp[i].get_data_frames()
df_team.append(df_out[1]) # index 1 will always contain team frame
df_team = pd.concat(df_team)
print(df_team)
Upvotes: 1