Marla
Marla

Reputation: 340

Python percentile rank of a column, grouped by multiple other columns

I would like to group a pandas dataframe by multiple fields ('date' and 'category'), and for each group, rank values of another field ('value') by percentile, while retaining the original ('value') field.

I have attempted:

df2 = df.groupby(['date', 'category'])['value'].rank(pct=True)

But this returns only percentiles for the 'value' field.

Upvotes: 0

Views: 1754

Answers (1)

jezrael
jezrael

Reputation: 863801

I believe you need assign Series to new column:

df = pd.DataFrame({
         'value':[1,3,5,7,1,0],
         'category':[5] * 6,
         'date':list('aaabbb')
})


df['new'] = df.groupby(['date', 'category'])['value'].rank(pct=True)
print (df)
   value  category date       new
0      1         5    a  0.333333
1      3         5    a  0.666667
2      5         5    a  1.000000
3      7         5    b  1.000000
4      1         5    b  0.666667
5      0         5    b  0.333333

Alternative with DataFrame.assign:

df = df.assign(new= df.groupby(['date', 'category'])['value'].rank(pct=True))

Upvotes: 2

Related Questions