Dima Lituiev
Dima Lituiev

Reputation: 13116

How to group by a hierarchical column in pandas?

I have a data frame with hierarchical column indices. Now I want to group it by a column ['X', 'chromosome']. Is there a way to do it without changing the structure of the data frame?

import pandas as pd

X =  pd.DataFrame.from_dict( {'chromosome':['chr1', 'chr2', 'chr2', 'chr2'],'start':[1,2,1,4]})
Y = pd.DataFrame.from_dict( {'chromosome':['chr1', 'chr2', 'chr2', 'chr3'],'start':[4,5,6,1]})
df_stats = pd.DataFrame.from_dict( {'pvalue':[ 1e-30, 1e-3, 1e-10, 1e-40],'t-stat':[4.4,5.5,6.6, 7.7]})

dd = {'X': X, 'Y': Y, 'STATS':df_stats}
df_qtls = pd.concat(dd.values(), axis = 1, keys= list(dd.keys()) )
df_qtls 

for n, g in df_qtls.groupby(['X', 'chromosome'], axis=0):
    print(n, g)

Results in an error:

...
ValueError: Grouper for 'X' not 1-dimensional

Upvotes: 3

Views: 4514

Answers (2)

Dima Lituiev
Dima Lituiev

Reputation: 13116

Another way I found is:

for n, g in df_qtls.groupby(df_qtls[x_pos_cols, 'chromosome'], axis=0):
    print(n)
    print(g)

Upvotes: 0

Jianxun Li
Jianxun Li

Reputation: 24752

For multi-level columns, use ('X', 'chromosome') to get access to a particular column.

for n, g in df_qtls.groupby([('X', 'chromosome')]):
    print(n)
    print(g)

chr1
           Y                X             STATS       
  chromosome start chromosome start      pvalue t-stat
0       chr1     4       chr1     1  1.0000e-30    4.4
chr2
           Y                X             STATS       
  chromosome start chromosome start      pvalue t-stat
1       chr2     5       chr2     2  1.0000e-03    5.5
2       chr2     6       chr2     1  1.0000e-10    6.6
3       chr3     1       chr2     4  1.0000e-40    7.7        

Upvotes: 6

Related Questions