Reputation: 183
I've got an issue with the following, which is a simple code that calculates some means on a sports dataset. If needed I can provide an example of what these data look like.
Right now attempting to append the generated dataframes back-to-back, which I am not able to do. Basically I would like to take all the datasets (there are two here, but more to be used), and to generate a timeline based on the outputs. All the files have the same column and indices. Below is the code I'm using:
import pandas as pd
season_list = ["2017-2018","2018-2019"]
excl_value = 10
header_list = []
for seas in season_list:
excel_sheet = pd.read_excel(r'Data\NData' + str(seas) + '.xlsx')
excel_sheet = excel_sheet.fillna(value = 0)
headers = excel_sheet.columns.values.tolist()
if headers not in header_list:
header_list.append(headers)
for header in header_list:
data_mean = excel_sheet[header][excel_sheet.games >=
excl_value].mean()
data_mean = round(data_mean,2)
print(data_mean)
With this I simply get two datasets, which look akin to the below:
age 25.42
games 22.42
games_starts 18.19
age 24.95
games 27.18
games_starts 22.00
These are the results of two different calculations of the mean, for a few variables. Now ideally I'd like to have these broken down by file, and by the same index concurrently. I'd see it working as below:
2017-2018 2018-2019
age 25.42 24.95
games 22.42 27.18
games_starts 18.19 22.00
Like this I can then create a timeline of the values together. However struggling to achieve this. How should I go about combining the calculations run, into one set all together? Thanks!
I've already tested a few methods of achieving this. One was just simply appending the excel sheets with .append, but I am seeing some issues with the values not coming together. I also thought about just grabbing the mean calculations and applying them to their own dictionary/list/dataframe, but I haven't managed to actually push them into their own set.
Upvotes: 0
Views: 161
Reputation: 788
You can do it nicely by 3 steps. Load:
sports_data = [pd.read_excel(r'Data\NData' + str(season) + '.xlsx') for season in season_list]
Now you'd like to add a column to identify each sheet such that you can group on it in the next step:
for i, season in enumerate(season_list):
sports_data[i]['season'] = season
Combine to a single DataFrame:
sports_data = pd.concat(sports_data)
Now we are ready to group by season and take the mean:
sports_data.groupby('season').mean()
This will give you
age games games_starts
season
2017-2018 26.666667 33.000000 22.333333
2018-2019 25.666667 25.666667 19.000000
What is left now is only to transpose it to get what you want:
sports_data.groupby('season').mean().T
Upvotes: 1