Reputation: 131
I am trying to add rows in data frame as part of a loop.
The programme loops through URLs and extracts data in a data frame format
for id in game_ids:
df_team_final = []
df_player_final = []
url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
content = requests.get(url)
data = json.loads(content.content)
At the end of the loop I have used concat to merge the two df of the away / home teams (and players)
team_full = pd.concat([df_home_team, df_away_team])
player_full = pd.concat([df_home_player_merge, df_away_player_merge])
Outside the loop I have then programmed to save as Excel
# #if cant find it, create new spread sheet
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()
As as I am looping through a number of web pages I need to update the df as I go, obviously in the current format I am just overwriting the first url with the second loop
What is the best way to append or add to the dataframe at the end of the loop?
Thank you
Upvotes: 1
Views: 3938
Reputation: 3910
I can give here only a bare outline as we don't see the full code.
I assume you don't append scraped data to some sort of a container, so it gets lost after the next iteration.
# empty lists outside of loop to store data
df_team_final = []
df_player_final = []
for id in game_ids:
url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
content = requests.get(url)
data = json.loads(content.content)
# create dataframes that you need
# df_home_team, df_away_team etc
# and append data to containers
team_full = pd.concat([df_home_team, df_away_team])
player_full = pd.concat([df_home_player_merge, df_away_player_merge])
df_team_final.append(team_full)
df_player_final.append(player_full )
Now that you stored dataframes as lists, you can merge them with pandas.concat
# outside of the loop
team_full = pd.concat(df_team_final)
player_full = pd.concat(df_player_final)
and save at once:
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()
From the file you shared I see that you added containers inside of a loop:
But you should put them before the loop begins:
# initialize them here
df_team_final = []
df_player_final = []
for id in game_ids:
Upvotes: 1