Reputation: 601
I have a dataframe which looks like this
pd.DataFrame({'A': ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10'],
...: 'B': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C'],
...: 'R': [9, 1, 7, 4, 3, 5, 2, 6, 8, 10]})
Out[3]:
A B R
0 C1 A 9
1 C2 A 1
2 C3 A 7
3 C4 B 4
4 C5 B 3
5 C6 B 5
6 C7 B 2
7 C8 C 6
8 C9 C 8
9 C10 C 10
column R is my rank column and I want to get the top 5 ranked items (column A), however, maximum of 3 items per group in column B can be selected.
I know I can do the following to select the top 5 ranked items
df.sort_values('R').head(5)
Out[10]:
A B R
1 C2 A 1
6 C7 B 2
4 C5 B 3
3 C4 B 4
5 C6 B 5
But this selects 4 items from group B. how can i restrict it to have only a maximum of 3 items per group selected? my resulting dataframe should look like this
A B R
1 C2 A 1
6 C7 B 2
4 C5 B 3
3 C4 B 4
5 C8 C 6
Logic - item C6 is not selected as it is the 4th item of group B so the next available item to be selected is C8 which has the next best rank and does not breach the group limitation.
Upvotes: 6
Views: 2828
Reputation: 5183
top = df.merge(
df.groupby('B').R.nsmallest(3) # get the 3 top ranked rows for each group
.reset_index('B'),
# `nsmallest` will return a new df with B and df.index as MultiIndex
# so we reset B to a column
# however column A is not in this new df, so we merge with the original df
how='right') # and drop any rows not in the new df
Output
A B R
0 C2 A 1
1 C3 A 7
2 C1 A 9
3 C7 B 2
4 C5 B 3
5 C4 B 4
6 C8 C 6
7 C9 C 8
8 C10 C 10
Upvotes: 0
Reputation: 30920
We can try with GroupBy.head
new_df = df.sort_values('R').groupby('B', sort=False).head(3).head(5)
print(new_df)
A B R
1 C2 A 1
6 C7 B 2
4 C5 B 3
3 C4 B 4
7 C8 C 6
Upvotes: 8