Tabaraei
Tabaraei

Reputation: 495

Pandas - first n maximum values groupby pandas dataframe

I have a pandas dataframe as below:

df = pd.DataFrame({
    'a': [1, 1, 1, 1, 1, 2, 2, 2, 2],
    'b': [3, 2, 1, 4, 2, 1, 2, 2, 1]
})

Which gives me

>>> df
    a   b
0   1   3
1   1   2
2   1   1
3   1   4
4   1   2
5   2   1
6   2   2
7   2   2
8   2   1

I want to group the dataframe by column a, and return first N maximum values from column b of each group, ordered descending by that maximum value.

What have I done?

Let's consider I only want first two maximum values, I've done the following:

  1. Sort the dataframe by both columns in descending order
  2. Get first two values
  3. Since these values are order ascending, sort another time in descending order
df = df.sort_values(['a', 'b'], ascending=False)
df = df.groupby('a').nth([0,1])
df = df.sort_values(['a', 'b'], ascending=False).reset_index()

Which gives me

>>> df
    a   b
0   2   2
1   2   2
2   1   4
3   1   3

The output Works fine, but it's not the efficient way to do this.

I have also tried nlargest function, but it's not helping, because it drops other columns and returns only the b column.

df = df.groupby('a')['b'].nlargest(2)

Which gives me

>>> df
0    2
1    2
2    4
3    3
Name: b, dtype: int64

What is the best way to do this?

Upvotes: 2

Views: 823

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Sort the values by column b then group the dataframe and aggregate using head(n) to select the first n rows of each group

df.sort_values('b', ascending=False).groupby('a').head(2)

   a  b
3  1  4
0  1  3
6  2  2
7  2  2

Upvotes: 3

Related Questions