How to get top n rows with a max limit by group in pandas

Question

I have a dataframe which looks like this

pd.DataFrame({'A': ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10'],
  ...:                    'B': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
  ...:                    'R': [9, 1, 7, 4, 3, 5, 2, 6, 8, 10]})
Out[3]: 
     A  B   R
0   C1  A   9
1   C2  A   1
2   C3  A   7
3   C4  B   4
4   C5  B   3
5   C6  B   5
6   C7  B   2
7   C8  C   6
8   C9  C   8
9  C10  C  10

column R is my rank column and I want to get the top 5 ranked items (column A), however, maximum of 3 items per group in column B can be selected.

I know I can do the following to select the top 5 ranked items

df.sort_values('R').head(5)
Out[10]: 
    A  B  R
1  C2  A  1
6  C7  B  2
4  C5  B  3
3  C4  B  4
5  C6  B  5

But this selects 4 items from group B. how can i restrict it to have only a maximum of 3 items per group selected? my resulting dataframe should look like this

    A  B  R
1  C2  A  1
6  C7  B  2
4  C5  B  3
3  C4  B  4
5  C8  C  6

Logic - item C6 is not selected as it is the 4th item of group B so the next available item to be selected is C8 which has the next best rank and does not breach the group limitation.

ansev · Accepted Answer

We can try with GroupBy.head

new_df = df.sort_values('R').groupby('B', sort=False).head(3).head(5)
print(new_df)

    A  B  R
1  C2  A  1
6  C7  B  2
4  C5  B  3
3  C4  B  4
7  C8  C  6

How to get top n rows with a max limit by group in pandas

Answers (2)

Related Questions