Reputation: 29
I have a multiindex DataFrame like following:
I want to sort each poster group (descending) and get the top-5. If the number of poster less than 5, drop the record.
Upvotes: 1
Views: 1682
Reputation: 294218
g = df.groupby(level=0)
def lrgst(df):
if len(df) >= 5:
return df.nlargest(5, 'Time')
pd.concat([lrgst(d) for _, d in g])
Upvotes: 1
Reputation: 210832
Assuming you have the following DF:
In [97]: df
Out[97]:
Time
waller poster
1 11 2
22 3
33 1
44 1
55 1
2 33 1
3 11 1
22 1
33 1
44 2
55 1
66 3
Solution:
In [98]: (df.sort_index(ascending=[1,0])
...: .groupby(level=0, as_index=False)
...: .apply(lambda x: x.head(5) if len(x) >= 5 else x.head(0))
...: .reset_index(level=0, drop=True)
...: )
...:
Out[98]:
Time
waller poster
1 55 1
44 1
33 1
22 3
11 2
3 66 3
55 1
44 2
33 1
22 1
Upvotes: 5
Reputation: 56
To sort the poster column you can use sort level
df.sortlevel(1, ascending=False)
To get the top n results you can use .head
df.head(5)
To drop records you can reference the respective level:
df = df[df.index.levels[1] > 5]
Let me know if this helps. Its hard to say if this will answer your problem with the limited information
Upvotes: 0