Zephyr
Zephyr

Reputation: 1352

Filtering Max 3 count using groupby in pandas

I am working with a dataframe that contains 20K rows. I created a sample dataframe as follow to replicate the data frame.

df = pd.DataFrame()
df ['Team'] = ['A1','A1','A1','A2','A2','A2','B1','B1','B1','B2','B2','B2']
df ['Competition'] = ['L1','L1','L1','L1','L1','L1','L2','L2','L2','L2','L2','L2']
df ['Score_count'] = [2,1,3,4,7,8,1,5,8,5,7,1]

I would like to keep the rows where two maximum values of Score_count by using groupby(['Competition','Team'])

I am able to keep the rows with maximum Score_count by using transform(max) as follow:

idx = df.groupby(['Competition','Team'])['Score_count'].transform(max) == df['Score_count']
df = df[idx]

But what I wanted to do is to keep n numbers of maximum (in this case is two maximum values) Score_count for the same groupby.

How can I do it?

Below is my expected output:

       Team     Competition  Score_count
0       A1          L1             3
1       A1          L1             2
2       A2          L1             8
3       A2          L1             7
4       B1          L2             8
5       B1          L2             5
6       B2          L2             7
7       B2          L2             5

You may also refer to the picture below for the expected output:

enter image description here

Can anyone advise how to do it? Thanks,

Zep

Upvotes: 0

Views: 351

Answers (1)

akuiper
akuiper

Reputation: 214987

groupby Team and Competition and then take the two largest values with .nlargest:

df.groupby(['Team', 'Competition']).Score_count.nlargest(2).reset_index([0,1])

#   Team Competition  Score_count
#2    A1          L1            3
#0    A1          L1            2
#5    A2          L1            8
#4    A2          L1            7
#8    B1          L2            8
#7    B1          L2            5
#10   B2          L2            7
#9    B2          L2            5

To drop the original index:

df.groupby(['Team', 'Competition']).Score_count.nlargest(2).reset_index([0,1]).reset_index(drop=True)

#  Team Competition  Score_count
#0   A1          L1            3
#1   A1          L1            2
#2   A2          L1            8
#3   A2          L1            7
#4   B1          L2            8
#5   B1          L2            5
#6   B2          L2            7
#7   B2          L2            5

Upvotes: 2

Related Questions