Hoàng Sơn
Hoàng Sơn

Reputation: 33

Extract and put the three largest values to separate columns

I have a database of baseball players and the teams they played for. The data frame can be simplified as follows:

df = pd.DataFrame({'teamID':['abc01','abc02','abc02'], 'playerID': ['p1','p2','p1'], 'score': [10,10,10] } )
teamID  playerID    score
abc01   p1          10
abc02   p2          10
abc02   p1          8
abc02   p3          1
abc03
...
zzz0z

Now I want to add three columns which respectively show the first, second and third biggest score of all players but grouped by the team they played. In particular, I would like the final table to look like this:

teamID  first   second  last
abc01   10         10       10
abc02   10         8        1
abc03
...     
...
zzz0z

I have tried several ways and it seems like the problem is a little bit complicated. Can anyone give me a hint?

Upvotes: 1

Views: 38

Answers (1)

jezrael
jezrael

Reputation: 862691

First sorting values per multiple columns by DataFrame.sort_values, create counter column by GroupBy.cumcount, filter top3 by boolean indexing and reshape by DataFrame.pivot:

df = df.sort_values(['teamID','score'], ascending=[True, False])

df['g'] = df.groupby('teamID').cumcount()

df = df[df['g'] < 3]
df = df.pivot('teamID','g','score').ffill(axis=1).astype(int)
df.columns = ['first','second','last']
df = df.reset_index()
print (df)
  teamID  first  second  last
0  abc01     10      10    10
1  abc02     10       8     1

Upvotes: 2

Related Questions