Nick S
Nick S

Reputation: 422

Pandas Binning for different sets

I have a dataframe of baseball players and some of their stats. For example

id    | position   |  gamesPlayed
---------------------------------
1      First Base    100
2      First Base    3
3      First Base    45
4      First Base    162
5      Second Base   145
6      Second Base   120
7      Second Base   6
8      Second Base   88

I can bin the gamesPlayed for all positions by doing something like:

labels = ['everyday','platoon','bench','scrub']
df['playingt_time'] = pd.qcut(df['gamesPlayed'], q=4, labels=labels)

But I'd prefer to label playing time based on position. I can do this for every position like:

pt1B = pd.qcut(df[df['position']=='First Base']['gamesPlayed'], q=4,labels=bin_labels)
pt2B = pd.qcut(df[df['position']=='Second Base']['gamesPlayed'], q=4,labels=bin_labels)

But then to update the dataframe with this playing time label is a but cumbersome, as I have to go through these steps:

pt1B.rename("playing_time",inplace=True)
pt2B.rename("playing_time",inplace=True)
df['playing_time'] = ''
df.update(pt1B)
df.update(pt2B)

I'm sure there is a way to do this more concisely but for the life of me I just haven't been able to figure it out! Any suggestions?

Upvotes: 1

Views: 60

Answers (1)

rhug123
rhug123

Reputation: 8768

I believe the code below should work. I added [::-1] to the end of your list to reverse the order.

labels = ['everyday','platoon','bench','scrub'][::-1]

df['category'] = df.groupby('position')['gamesPlayed'].transform(lambda x: pd.qcut(x,q=4, labels=labels))

Upvotes: 1

Related Questions