alphanumeric
alphanumeric

Reputation: 19349

How to get integers instead of floats from DataFrame's rank method

To substitute the numbers with their corresponding "ranks":

import pandas as pd
import numpy as np

numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,)) 
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank()

I am getting the float values as the default output type of rankmethod:

987     82.0
988     36.5
989    526.0
990    219.0
991    957.0
992    819.5
993    787.5
994    513.0

Instead of floats I would rather have the integers. Rounding the resulted float values using asType(int) would be risky since converting to int would probably introduce the duplicated values from the float values that are too close to each other such as 3.5 and 4.0. Those when converted to the integers both would result to the integer value of 4.

Is there any way to guide rank method to output the integers?

Upvotes: 5

Views: 3435

Answers (3)

Sam
Sam

Reputation: 518

No need to use method='dense', just convert to an integer.

df['a_rank'] = df['a'].rank().astype(int)

Upvotes: 0

whisperer
whisperer

Reputation: 587

The above solution did not work for me. The following did work though. The critical line with edits is:

df['a_rank'] = df['a'].rank(method='dense').astype(int);

This could be a version issue.

Upvotes: 3

EdChum
EdChum

Reputation: 394159

Pass param method='dense', this will increase the ranks by 1 between groups, see the docs:

In [2]:

numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,)) 
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank(method='dense')
df
Out[2]:
        a  a_rank
0    1095     114
1    2514     248
2     500      53
3    6112     592
4    5582     533
5     851      91
6    2887     287
7    3798     366
8    4698     458
9    1699     170
10   4739     462
11   7199     693
12    817      88
13   3801     367
14   5584     534
15   4939     481
16   2569     258
17   6806     656
18     93       8
19   8574     816
20   4107     396
21   7086     684
22   6819     657
23   8844     847
24    170      15
25   6629     634
26   9905     950
27   5312     512
28   3794     365
29   9476     911
..    ...     ...
970  4607     447
971  8430     801
972  6527     625
973  2794     280
974  4414     425
975  1069     111
976  2849     285
977  7955     759
978  5767     547
979  7767     742
980  2956     294
981  5847     554
982  1029     107
983  4967     485
984   256      25
985  5577     532
986  6866     662
987  5903     563
988  1785     181
989   749      78
990  2164     212
991  1074     112
992  8752     837
993  2737     272
994  2761     277
995  7355     705
996  8956     857
997  4831     473
998   222      21
999  9531     917

[1000 rows x 2 columns]

Upvotes: 2

Related Questions