Reputation: 33
I have a database of baseball players and the teams they played for. The data frame can be simplified as follows:
df = pd.DataFrame({'teamID':['abc01','abc02','abc02'], 'playerID': ['p1','p2','p1'], 'score': [10,10,10] } )
teamID playerID score
abc01 p1 10
abc02 p2 10
abc02 p1 8
abc02 p3 1
abc03
...
zzz0z
Now I want to add three columns which respectively show the first, second and third biggest score of all players but grouped by the team they played. In particular, I would like the final table to look like this:
teamID first second last
abc01 10 10 10
abc02 10 8 1
abc03
...
...
zzz0z
I have tried several ways and it seems like the problem is a little bit complicated. Can anyone give me a hint?
Upvotes: 1
Views: 38
Reputation: 862691
First sorting values per multiple columns by DataFrame.sort_values
, create counter column by GroupBy.cumcount
, filter top3 by boolean indexing
and reshape by DataFrame.pivot
:
df = df.sort_values(['teamID','score'], ascending=[True, False])
df['g'] = df.groupby('teamID').cumcount()
df = df[df['g'] < 3]
df = df.pivot('teamID','g','score').ffill(axis=1).astype(int)
df.columns = ['first','second','last']
df = df.reset_index()
print (df)
teamID first second last
0 abc01 10 10 10
1 abc02 10 8 1
Upvotes: 2