BhishanPoudel
BhishanPoudel

Reputation: 17154

How to aggregate two largest values per group in pandas?

I was going through this link: Return top N largest values per group using pandas

and found multiple ways to find the topN values per group.

However, I prefer dictionary method with agg function and would like to know if it is possible to get the equivalent of the dictionary method for the following problem?

import numpy as np
import pandas as pd

df = pd.DataFrame({'A': [1,   1,   1,   2,   2],
                   'B': [1,   1,   2,   2,   1],
                   'C': [10,  20,  30,  40,  50],
                   'D': ['X', 'Y', 'X', 'Y', 'Y']})
print(df)
   A  B   C  D
0  1  1  10  X
1  1  1  20  Y
2  1  2  30  X
3  2  2  40  Y
4  2  1  50  Y

I can do this:

df1 = df.groupby(['A'])['C'].nlargest(2).droplevel(-1).reset_index()
print(df1)
   A   C
0  1  30
1  1  20
2  2  50
3  2  40

# also this
df1 = df.sort_values('C', ascending=False).groupby('A', sort=False).head(2)
print(df1)

# also this
df.set_index('C').groupby('A')['B'].nlargest(2).reset_index()

Required

df.groupby('A',as_index=False).agg(
    {'C': lambda ser: ser.nlargest(2)  # something like this
    })

Is it possible to use the dictionary here?

Upvotes: 0

Views: 563

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

If you want to get a dictionary like A: 2 top values from C, you can run:

df.groupby(['A'])['C'].apply(lambda x:
    x.nlargest(2).tolist()).to_dict()

For your DataFrame, the result is:

{1: [30, 20], 2: [50, 40]}

Upvotes: 2

Related Questions