Pandas - first n maximum values groupby pandas dataframe

Question

I have a pandas dataframe as below:

df = pd.DataFrame({
    'a': [1, 1, 1, 1, 1, 2, 2, 2, 2],
    'b': [3, 2, 1, 4, 2, 1, 2, 2, 1]
})

Which gives me

I want to group the dataframe by column a, and return first N maximum values from column b of each group, ordered descending by that maximum value.

What have I done?

Let's consider I only want first two maximum values, I've done the following:

Sort the dataframe by both columns in descending order
Get first two values
Since these values are order ascending, sort another time in descending order

df = df.sort_values(['a', 'b'], ascending=False)
df = df.groupby('a').nth([0,1])
df = df.sort_values(['a', 'b'], ascending=False).reset_index()

Which gives me

The output Works fine, but it's not the efficient way to do this.

I have also tried nlargest function, but it's not helping, because it drops other columns and returns only the b column.

df = df.groupby('a')['b'].nlargest(2)

Which gives me

>>> df
0    2
1    2
2    4
3    3
Name: b, dtype: int64

What is the best way to do this?

Shubham Sharma · Accepted Answer

Sort the values by column b then group the dataframe and aggregate using head(n) to select the first n rows of each group

df.sort_values('b', ascending=False).groupby('a').head(2)

Pandas - first n maximum values groupby pandas dataframe

What have I done?

Answers (1)

Related Questions