python pandas use groupby using two criteria/columns

Question

I have a df called resulttable that looks like:

df Index     Cybi_Tag     Tag       Exp. m/z     Intensity     Norm_Intensity
114          17           C09.xls   1013.328     1000          0.5
114          17           C10.xls   1013.325     1100          0.6
114          17           D09.xls   1013.33      1200          0.7
114          17           D10.xls   1013.331     1300          0.8
114          18           C11.xls   1013.328     2000          0.9
114          18           C12.xls   1013.326     2100          1.0
114          18           D11.xls   1013.326     2200          1.1
114          18           D12.xls   1013.328     2300          1.2
180          17           C09.xls   1079.47      3000          1.3
180          17           D09.xls   1079.432     3200          1.5
180          17           D10.xls   1079.446     3300          1.6
180          18           C11.xls   1079.452     4000          1.7
180          18           C12.xls   1079.442     4100          1.8

Where df Index values are the indices of the dataframe when resulttable is exported to csv file, and Cybi_Tag values were added to the dataframe as tagging numbers to group the tags. Tag column contains the names of the files that the data have come from.

My goal is to calculate mean, std, and CV of Intensity and Norm_Intensity values for each Cybi_Tag within df Index. So, for example, for df Index 114, Cybi_Tag 17, I would like to group 1013.328-ish values (so C09.xls, C10.xls, D09.xls, D10.xls), and find the average of Intensity and Norm_Intensity values.

I table I want should look like:

df Index     Cybi_Tag       Exp. m/z     Intensity     Norm_Intensity
114          17             1013.329     1150          0.65
114          18             1013.327     2150          1.05
180          17             1079.449     3167          1.467
180          18             1079.455     4050          1.75

So for the above table,

I tried to do groupby on df Index and Cybi_Tag by:

resulttable_calc = resulttable.groupby(level=0)['Cybi_Tag']

or

resulttable_calc = resulttable.groupby['Cybi_Tag']

However, the first code makes an error message of Column not found: Cybi_Tag. The second code doesn't produce an error, but it groups the entire Cybi_Tag, so it groups C09, C10, D09, D10 for all Exp. m/z values, so I am left with two rows at the end, which is not what I want.

resulttable_calc = resulttable.groupby(level= 0&1)

The above code only groups by the df Index, so it groups the Exp. m/z, regardless of the tag values.

jezrael · Accepted Answer

I think you need list of levels separated by ,:

#specify order of MultiIndex levels
resulttable_calc = resulttable.groupby(level=[0,1])
#specify name of MultiIndex levels
resulttable_calc = resulttable.groupby(level=['df Index','Cybi_Tag'])

If use last version of pandas is possible use:

#same as specify columns names
resulttable_calc = resulttable.groupby(['df Index','Cybi_Tag'])

And then aggregate with remove MultiIndex in columns by map:

resulttable_calc = resulttable.groupby(level=[0,1])
                              .agg({'Intensity':['mean','std'], 'Exp. m/z': 'mean'})
print (resulttable_calc)
                      Exp. m/z    Intensity            
                          mean         mean         std
df Index Cybi_Tag                                      
114      17        1013.328500  1150.000000  129.099445
         18        1013.327000  2150.000000  129.099445
180      17        1079.449333  3166.666667  152.752523
         18        1079.447000  4050.000000   70.710678

resulttable_calc.columns = resulttable_calc.columns.map('_'.join)
d = {'Intensity_mean':'Average','Exp. m/z_mean':'Exp. m/z', 'Intensity_std':'Norm_Intensity'}
resulttable_calc = resulttable_calc.rename(columns=d).reset_index()
print (resulttable_calc)
   df Index  Cybi_Tag     Exp. m/z      Average  Norm_Intensity
0       114        17  1013.328500  1150.000000      129.099445
1       114        18  1013.327000  2150.000000      129.099445
2       180        17  1079.449333  3166.666667      152.752523
3       180        18  1079.447000  4050.000000       70.710678

python pandas use groupby using two criteria/columns

Answers (1)

Related Questions