kikee1222
kikee1222

Reputation: 2016

Calculating percentiles as a column in Pandas

I have the below dataframe. I am trying to calculate what percentile the 'recency' field is in and add it as it's own field. I've been trying the quantiles function in Pandas, but get the NaN output shown below.

Can anyone advise me on how I might do this?

       id  recency  frequency  monetary  recencypct
0       1       61         61   6052.50         NaN
1       2       43         97   1743.79         NaN
2       3       71         55   7293.29         NaN
3       4       32         77   4493.31         NaN
4       5        7         14   2036.86         NaN
5       6       57         41   1380.94         NaN
6       7       12         47   9451.65         NaN
7       8       98         12   8687.91         NaN
8       9       24         90   6350.54         NaN
9      10       41          8    599.80         NaN
10     11       61         17    212.13         NaN
11     12       29         89   8501.65         NaN
12     13        9         27   7165.10         NaN
13     14       77         31   6011.45         NaN
14     15       37          8   9491.75         NaN
15     16      100         76   1894.23         NaN
16     17       25          8   5753.13         NaN
17     18       19         45    333.16         NaN
18     19       14         90   8762.78         NaN
19     20       16         20    231.76         NaN

Upvotes: 1

Views: 332

Answers (1)

Isaac B
Isaac B

Reputation: 160

If the data frame is called df, then try:

df['recencypct'] = df.recency.rank(pct=True)

Output (pretty printed):

+----+------+-----------+-------------+------------+--------------+
|    |   id |   recency |   frequency |   monetary |   recencypct |
|----+------+-----------+-------------+------------+--------------|
|  0 |    1 |        61 |          61 |    6052.5  |        0.775 |
|  1 |    2 |        43 |          97 |    1743.79 |        0.65  |
|  2 |    3 |        71 |          55 |    7293.29 |        0.85  |
|  3 |    4 |        32 |          77 |    4493.31 |        0.5   |
|  4 |    5 |         7 |          14 |    2036.86 |        0.05  |
|  5 |    6 |        57 |          41 |    1380.94 |        0.7   |
|  6 |    7 |        12 |          47 |    9451.65 |        0.15  |
|  7 |    8 |        98 |          12 |    8687.91 |        0.95  |
|  8 |    9 |        24 |          90 |    6350.54 |        0.35  |
|  9 |   10 |        41 |           8 |     599.8  |        0.6   |
| 10 |   11 |        61 |          17 |     212.13 |        0.775 |
| 11 |   12 |        29 |          89 |    8501.65 |        0.45  |
| 12 |   13 |         9 |          27 |    7165.1  |        0.1   |
| 13 |   14 |        77 |          31 |    6011.45 |        0.9   |
| 14 |   15 |        37 |           8 |    9491.75 |        0.55  |
| 15 |   16 |       100 |          76 |    1894.23 |        1     |
| 16 |   17 |        25 |           8 |    5753.13 |        0.4   |
| 17 |   18 |        19 |          45 |     333.16 |        0.3   |
| 18 |   19 |        14 |          90 |    8762.78 |        0.2   |
| 19 |   20 |        16 |          20 |     231.76 |        0.25  |
+----+------+-----------+-------------+------------+--------------+

Reference: http://www.datasciencemadesimple.com/percentile-rank-column-pandas-python-2/

Upvotes: 3

Related Questions