Reputation: 769
I'm looking to filter a set for the group with the smallest average value. In the example below, we'd group by ID, get the average year, and then only keep the group with the smallest average year. This feels basic, but I haven't seen an answer for it yet.
df = pd.DataFrame({'ID':[1,1,2,2,3,3],
'YEAR' : [2011,2012,2012,2013,2013,2014],
'V': [0,1,1,0,1,0],
'C':[00,11,22,33,44,55]})
Upvotes: 0
Views: 36
Reputation: 53
groups = df.groupby(['ID']).mean()
Then
groups.loc[groups['YEAR'] == groups['YEAR'].min()]
Upvotes: 1
Reputation: 18306
In [180]: min_id = df.groupby("ID")["YEAR"].mean().idxmin()
In [181]: df.query("ID == @min_id")
Out[181]:
ID YEAR V C
0 1 2011 0 0
1 1 2012 1 11
this is how the aggregated mean looks like:
In [182]: df.groupby("ID")["YEAR"].mean()
Out[182]:
ID
1 2011.5
2 2012.5
3 2013.5
Name: YEAR, dtype: float64
so the minimum is 2011.5, index of it is 1 (the ID).
Upvotes: 4