Reputation: 2016
I have the below dataframe. I am trying to calculate what percentile the 'recency' field is in and add it as it's own field. I've been trying the quantiles function in Pandas, but get the NaN output shown below.
Can anyone advise me on how I might do this?
id recency frequency monetary recencypct
0 1 61 61 6052.50 NaN
1 2 43 97 1743.79 NaN
2 3 71 55 7293.29 NaN
3 4 32 77 4493.31 NaN
4 5 7 14 2036.86 NaN
5 6 57 41 1380.94 NaN
6 7 12 47 9451.65 NaN
7 8 98 12 8687.91 NaN
8 9 24 90 6350.54 NaN
9 10 41 8 599.80 NaN
10 11 61 17 212.13 NaN
11 12 29 89 8501.65 NaN
12 13 9 27 7165.10 NaN
13 14 77 31 6011.45 NaN
14 15 37 8 9491.75 NaN
15 16 100 76 1894.23 NaN
16 17 25 8 5753.13 NaN
17 18 19 45 333.16 NaN
18 19 14 90 8762.78 NaN
19 20 16 20 231.76 NaN
Upvotes: 1
Views: 332
Reputation: 160
If the data frame is called df
, then try:
df['recencypct'] = df.recency.rank(pct=True)
Output (pretty printed):
+----+------+-----------+-------------+------------+--------------+
| | id | recency | frequency | monetary | recencypct |
|----+------+-----------+-------------+------------+--------------|
| 0 | 1 | 61 | 61 | 6052.5 | 0.775 |
| 1 | 2 | 43 | 97 | 1743.79 | 0.65 |
| 2 | 3 | 71 | 55 | 7293.29 | 0.85 |
| 3 | 4 | 32 | 77 | 4493.31 | 0.5 |
| 4 | 5 | 7 | 14 | 2036.86 | 0.05 |
| 5 | 6 | 57 | 41 | 1380.94 | 0.7 |
| 6 | 7 | 12 | 47 | 9451.65 | 0.15 |
| 7 | 8 | 98 | 12 | 8687.91 | 0.95 |
| 8 | 9 | 24 | 90 | 6350.54 | 0.35 |
| 9 | 10 | 41 | 8 | 599.8 | 0.6 |
| 10 | 11 | 61 | 17 | 212.13 | 0.775 |
| 11 | 12 | 29 | 89 | 8501.65 | 0.45 |
| 12 | 13 | 9 | 27 | 7165.1 | 0.1 |
| 13 | 14 | 77 | 31 | 6011.45 | 0.9 |
| 14 | 15 | 37 | 8 | 9491.75 | 0.55 |
| 15 | 16 | 100 | 76 | 1894.23 | 1 |
| 16 | 17 | 25 | 8 | 5753.13 | 0.4 |
| 17 | 18 | 19 | 45 | 333.16 | 0.3 |
| 18 | 19 | 14 | 90 | 8762.78 | 0.2 |
| 19 | 20 | 16 | 20 | 231.76 | 0.25 |
+----+------+-----------+-------------+------------+--------------+
Reference: http://www.datasciencemadesimple.com/percentile-rank-column-pandas-python-2/
Upvotes: 3