irene
irene

Reputation: 2253

Performing functions on multiindex in groupby

I have a dataframe with a MultiIndex. Here's a minimal working example:

df = pd.DataFrame({'note':[1,1,1,2,2,2,2],'t': [0.5,0.7,1.2,0.3,0.9,1.3,1.7],'val':[1,-1,0,0,1,0,0]})
dfs = df.set_index(['note','t'])

which gives

>>> dfs
          val
note t
1    0.5    1
     0.7   -1
     1.2    0
2    0.3    0
     0.9    1
     1.3    0
     1.7    0

what I want is to get (a) the minimum value and (b) the first value in the t index per group:

note min  first
1    0.5  0.5
2    0.3  0.3

I could do a groupby on the original dataframe df where note and t are columns and not indices:

df.groupby('note').agg({'t': [min, lambda x: list(x)[0]]})

but I'd rather not do a reset_index() followed by another set_index() to restore the dataframe to the MultiIndex version. How do I do this? The agg function only works on columns and not the indices.

Upvotes: 2

Views: 178

Answers (2)

Andy L.
Andy L.

Reputation: 25269

You may work directly on the multiindex. Use to_series to convert dfs.index to series of multiindex and tuple values. Next, slice to get last evelemnt of the tuple values. After that just do groupby on level=0 and agg like regular column

dfs.index.to_series().str[-1].groupby(level=0).agg(['min', 'first'])

Out[463]:
      min  first
note
1     0.5    0.5
2     0.3    0.3

Upvotes: 1

jezrael
jezrael

Reputation: 863531

It is possible, but not very clean:

df = (dfs.index.get_level_values(1).to_series()
         .groupby(dfs.index.get_level_values(0))
         .agg(['min', 'first']))
print (df)
      min  first
note            
1     0.5    0.5
2     0.3    0.3

df = dfs.reset_index('t').groupby(level=0)['t'].agg(['min', 'first'])
print (df)
      min  first
note            
1     0.5    0.5
2     0.3    0.3

Upvotes: 3

Related Questions