Adam Sewell
Adam Sewell

Reputation: 131

Add rows to pandas data frame at the end of a loop

I am trying to add rows in data frame as part of a loop.

The programme loops through URLs and extracts data in a data frame format

for id in game_ids:
    df_team_final = []
    df_player_final = []
    url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
    content = requests.get(url)
    data = json.loads(content.content)

At the end of the loop I have used concat to merge the two df of the away / home teams (and players)

    team_full = pd.concat([df_home_team, df_away_team])
    player_full = pd.concat([df_home_player_merge, df_away_player_merge])

Outside the loop I have then programmed to save as Excel

# #if cant find it, create new spread sheet
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()

As as I am looping through a number of web pages I need to update the df as I go, obviously in the current format I am just overwriting the first url with the second loop

What is the best way to append or add to the dataframe at the end of the loop?

Thank you

Upvotes: 1

Views: 3938

Answers (1)

Hryhorii Pavlenko
Hryhorii Pavlenko

Reputation: 3910

I can give here only a bare outline as we don't see the full code.

I assume you don't append scraped data to some sort of a container, so it gets lost after the next iteration.

# empty lists outside of loop to store data
df_team_final = []
df_player_final = []

for id in game_ids:
    url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
    content = requests.get(url)
    data = json.loads(content.content)

    # create dataframes that you need
    # df_home_team, df_away_team etc
    # and append data to containers

    team_full = pd.concat([df_home_team, df_away_team])
    player_full = pd.concat([df_home_player_merge, df_away_player_merge])

    df_team_final.append(team_full)
    df_player_final.append(player_full )

Now that you stored dataframes as lists, you can merge them with pandas.concat

# outside of the loop
team_full = pd.concat(df_team_final)
player_full = pd.concat(df_player_final)

and save at once:

writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()

Edit

From the file you shared I see that you added containers inside of a loop:

enter image description here

But you should put them before the loop begins:

# initialize them here
df_team_final = []
df_player_final = []
for id in game_ids:

Upvotes: 1

Related Questions