pyds_learner
pyds_learner

Reputation: 519

Create a score column in pandas whose value depends on the percentile of another column

I have the following dataframe:

User_ID Game_ID votes
1         11    1040
1         11    nan
1         22    1101
1         11    540
1         33    nan
2         33    nan
2         33    290
2         33    nan

Based on the percentile of the values in the column votes, a new column needs to be created, per the following rules:

If the “votes” value is >= 75th percentile assign a score of 2

If >=25th percentile assign a score of 1

If <25th percentile assign a score of 0.

Upvotes: 3

Views: 537

Answers (2)

C.Acarbay
C.Acarbay

Reputation: 434

You can get the percentiles by calling describe and use list comprehension:

percentiles = df.votes.describe()
df['scores'] = [2 if x >= percentiles['75%'] else (0 if x < percentiles['25%'] else 1) for x in df.votes]

Upvotes: 2

gmds
gmds

Reputation: 19885

Use pd.qcut:

df['score'] = pd.qcut(df['votes'].astype(float), [0, 0.25, 0.75, 1.0]).cat.codes
print(df)

Output (nan corresponds to -1):

0    1
1   -1
2    2
3    1
4   -1
5   -1
6    0
7   -1
dtype: int8

Upvotes: 2

Related Questions