Take Smallest Group by Average

Question

I'm looking to filter a set for the group with the smallest average value. In the example below, we'd group by ID, get the average year, and then only keep the group with the smallest average year. This feels basic, but I haven't seen an answer for it yet.

df = pd.DataFrame({'ID':[1,1,2,2,3,3], 
                   'YEAR' : [2011,2012,2012,2013,2013,2014], 
                   'V': [0,1,1,0,1,0],
                   'C':[00,11,22,33,44,55]})

Mustafa Aydın · Accepted Answer

In [180]: min_id = df.groupby("ID")["YEAR"].mean().idxmin()

In [181]: df.query("ID == @min_id")
Out[181]:
   ID  YEAR  V   C
0   1  2011  0   0
1   1  2012  1  11

get the index of the minimum mean year having group
filter the frame where ID is equal to that

this is how the aggregated mean looks like:

In [182]: df.groupby("ID")["YEAR"].mean()
Out[182]:
ID
1    2011.5
2    2012.5
3    2013.5
Name: YEAR, dtype: float64

so the minimum is 2011.5, index of it is 1 (the ID).

Take Smallest Group by Average

Answers (2)

Related Questions