Reputation: 391
I have a df called resulttable that looks like:
df Index Cybi_Tag Tag Exp. m/z Intensity Norm_Intensity
114 17 C09.xls 1013.328 1000 0.5
114 17 C10.xls 1013.325 1100 0.6
114 17 D09.xls 1013.33 1200 0.7
114 17 D10.xls 1013.331 1300 0.8
114 18 C11.xls 1013.328 2000 0.9
114 18 C12.xls 1013.326 2100 1.0
114 18 D11.xls 1013.326 2200 1.1
114 18 D12.xls 1013.328 2300 1.2
180 17 C09.xls 1079.47 3000 1.3
180 17 D09.xls 1079.432 3200 1.5
180 17 D10.xls 1079.446 3300 1.6
180 18 C11.xls 1079.452 4000 1.7
180 18 C12.xls 1079.442 4100 1.8
Where df Index values are the indices of the dataframe when resulttable is exported to csv file, and Cybi_Tag values were added to the dataframe as tagging numbers to group the tags. Tag column contains the names of the files that the data have come from.
My goal is to calculate mean, std, and CV of Intensity and Norm_Intensity values for each Cybi_Tag within df Index. So, for example, for df Index 114, Cybi_Tag 17, I would like to group 1013.328-ish values (so C09.xls, C10.xls, D09.xls, D10.xls), and find the average of Intensity and Norm_Intensity values.
I table I want should look like:
df Index Cybi_Tag Exp. m/z Intensity Norm_Intensity
114 17 1013.329 1150 0.65
114 18 1013.327 2150 1.05
180 17 1079.449 3167 1.467
180 18 1079.455 4050 1.75
So for the above table,
I tried to do groupby on df Index and Cybi_Tag by:
resulttable_calc = resulttable.groupby(level=0)['Cybi_Tag']
or
resulttable_calc = resulttable.groupby['Cybi_Tag']
However, the first code makes an error message of Column not found: Cybi_Tag. The second code doesn't produce an error, but it groups the entire Cybi_Tag, so it groups C09, C10, D09, D10 for all Exp. m/z values, so I am left with two rows at the end, which is not what I want.
resulttable_calc = resulttable.groupby(level= 0&1)
The above code only groups by the df Index, so it groups the Exp. m/z, regardless of the tag values.
Upvotes: 2
Views: 1138
Reputation: 863281
I think you need list
of levels separated by ,
:
#specify order of MultiIndex levels
resulttable_calc = resulttable.groupby(level=[0,1])
#specify name of MultiIndex levels
resulttable_calc = resulttable.groupby(level=['df Index','Cybi_Tag'])
If use last version of pandas is possible use:
#same as specify columns names
resulttable_calc = resulttable.groupby(['df Index','Cybi_Tag'])
And then aggregate with remove MultiIndex in columns by map
:
resulttable_calc = resulttable.groupby(level=[0,1])
.agg({'Intensity':['mean','std'], 'Exp. m/z': 'mean'})
print (resulttable_calc)
Exp. m/z Intensity
mean mean std
df Index Cybi_Tag
114 17 1013.328500 1150.000000 129.099445
18 1013.327000 2150.000000 129.099445
180 17 1079.449333 3166.666667 152.752523
18 1079.447000 4050.000000 70.710678
resulttable_calc.columns = resulttable_calc.columns.map('_'.join)
d = {'Intensity_mean':'Average','Exp. m/z_mean':'Exp. m/z', 'Intensity_std':'Norm_Intensity'}
resulttable_calc = resulttable_calc.rename(columns=d).reset_index()
print (resulttable_calc)
df Index Cybi_Tag Exp. m/z Average Norm_Intensity
0 114 17 1013.328500 1150.000000 129.099445
1 114 18 1013.327000 2150.000000 129.099445
2 180 17 1079.449333 3166.666667 152.752523
3 180 18 1079.447000 4050.000000 70.710678
Upvotes: 2