Reputation: 633
I've imported an excel sheet which has a series of tables. The pandas dataframe looks like this:
1 2 3 4
0 3 2 7 2
1 4 2 8 1
2 5 1 4 1
3 6 0 2 3
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
1 2 3 4
0 3 2 2 1
1 3 3 9 1
2 3 1 5 1
3 2 9 4 1
......
I'd like to average all values in each respective cell (ie. average values in row 0, column 1 of each table) resulting in 1 table which contains all the averages.
I'm not sure how to alter the df.groupby(['1']).mean()
function in order to isolate the cells by row as well. I can use a loop to iterate through columns, but it may be tricky to do that and iterate through the rows simultaneously. I'd appreciate any suggestions.
Desired output:
1 2 3 4
0 3 2 4.5 1.5
1 3.5 2.5 8.5 1
2 4 1 4.5 1
3 4 4.5 3 2
Upvotes: 1
Views: 58
Reputation: 862521
If first column is index and same columns names in each subDataFrame simpliest is:
print (df)
1 2 3 4
0.0 3.0 2.0 7.0 2.0
1.0 4.0 2.0 8.0 1.0
2.0 5.0 1.0 4.0 1.0
3.0 6.0 0.0 2.0 3.0
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN 1.0 2.0 3.0 4.0
0.0 3.0 2.0 2.0 1.0
1.0 3.0 3.0 9.0 1.0
2.0 3.0 1.0 5.0 1.0
3.0 2.0 9.0 4.0 1.0
df = df.groupby(level=0).mean()
print (df)
1 2 3 4
0.0 3.0 2.0 4.5 1.5
1.0 3.5 2.5 8.5 1.0
2.0 4.0 1.0 4.5 1.0
3.0 4.0 4.5 3.0 2.0
If not, is necessary some preprocessing, depends of data.
E.g. remove rows if last column has NaN
s for avoid mixed row=1 with indexes=1:
print (df)
1 2 3 4
0.0 3.0 2.0 7.0 2.0
1.0 4.0 2.0 8.0 1.0
2.0 5.0 1.0 4.0 1.0
3.0 6.0 0.0 2.0 3.0
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
1.0 2.0 3.0 4.0 NaN <- columns names like index=1 here are removed
0.0 3.0 2.0 2.0 1.0
1.0 3.0 3.0 9.0 1.0
2.0 3.0 1.0 5.0 1.0
3.0 2.0 9.0 4.0 1.0
df = df.dropna(subset=df.columns[-1:]).groupby(level=0).mean()
print (df)
1 2 3 4
0.0 3.0 2.0 4.5 1.5
1.0 3.5 2.5 8.5 1.0
2.0 4.0 1.0 4.5 1.0
3.0 4.0 4.5 3.0 2.0
Upvotes: 3