李仁德
李仁德

Reputation: 29

Pandas multiindex dataframe get top 5 row of each sorted group

I have a multiindex DataFrame like following:

enter image description here

I want to sort each poster group (descending) and get the top-5. If the number of poster less than 5, drop the record.

Upvotes: 1

Views: 1682

Answers (3)

piRSquared
piRSquared

Reputation: 294218

g = df.groupby(level=0)

def lrgst(df):
    if len(df) >= 5:
        return df.nlargest(5, 'Time')

pd.concat([lrgst(d) for _, d in g])

enter image description here

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Assuming you have the following DF:

In [97]: df
Out[97]:
               Time
waller poster
1      11         2
       22         3
       33         1
       44         1
       55         1
2      33         1
3      11         1
       22         1
       33         1
       44         2
       55         1
       66         3

Solution:

In [98]: (df.sort_index(ascending=[1,0])
    ...:    .groupby(level=0, as_index=False)
    ...:    .apply(lambda x: x.head(5) if len(x) >= 5 else x.head(0))
    ...:    .reset_index(level=0, drop=True)
    ...: )
    ...:
Out[98]:
               Time
waller poster
1      55         1
       44         1
       33         1
       22         3
       11         2
3      66         3
       55         1
       44         2
       33         1
       22         1

Upvotes: 5

madman75
madman75

Reputation: 56

To sort the poster column you can use sort level

df.sortlevel(1, ascending=False)

To get the top n results you can use .head

df.head(5)

To drop records you can reference the respective level:

df = df[df.index.levels[1] > 5]

Let me know if this helps. Its hard to say if this will answer your problem with the limited information

Upvotes: 0

Related Questions