idt_tt
idt_tt

Reputation: 601

How to get top n rows with a max limit by group in pandas

I have a dataframe which looks like this

pd.DataFrame({'A': ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10'],
  ...:                    'B': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
  ...:                    'R': [9, 1, 7, 4, 3, 5, 2, 6, 8, 10]})
Out[3]: 
     A  B   R
0   C1  A   9
1   C2  A   1
2   C3  A   7
3   C4  B   4
4   C5  B   3
5   C6  B   5
6   C7  B   2
7   C8  C   6
8   C9  C   8
9  C10  C  10

column R is my rank column and I want to get the top 5 ranked items (column A), however, maximum of 3 items per group in column B can be selected.

I know I can do the following to select the top 5 ranked items

df.sort_values('R').head(5)
Out[10]: 
    A  B  R
1  C2  A  1
6  C7  B  2
4  C5  B  3
3  C4  B  4
5  C6  B  5

But this selects 4 items from group B. how can i restrict it to have only a maximum of 3 items per group selected? my resulting dataframe should look like this

    A  B  R
1  C2  A  1
6  C7  B  2
4  C5  B  3
3  C4  B  4
5  C8  C  6

Logic - item C6 is not selected as it is the 4th item of group B so the next available item to be selected is C8 which has the next best rank and does not breach the group limitation.

Upvotes: 6

Views: 2828

Answers (2)

RichieV
RichieV

Reputation: 5183

top = df.merge(
    df.groupby('B').R.nsmallest(3) # get the 3 top ranked rows for each group
        .reset_index('B'),
        # `nsmallest` will return a new df with B and df.index as MultiIndex
        # so we reset B to a column
        # however column A is not in this new df, so we merge with the original df
    how='right') # and drop any rows not in the new df

Output

     A  B   R
0   C2  A   1
1   C3  A   7
2   C1  A   9
3   C7  B   2
4   C5  B   3
5   C4  B   4
6   C8  C   6
7   C9  C   8
8  C10  C  10

Upvotes: 0

ansev
ansev

Reputation: 30920

We can try with GroupBy.head

new_df = df.sort_values('R').groupby('B', sort=False).head(3).head(5)
print(new_df)

    A  B  R
1  C2  A  1
6  C7  B  2
4  C5  B  3
3  C4  B  4
7  C8  C  6

Upvotes: 8

Related Questions