Reputation: 19349
To substitute the numbers with their corresponding "ranks":
import pandas as pd
import numpy as np
numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,))
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank()
I am getting the float values as the default output type of rank
method:
987 82.0
988 36.5
989 526.0
990 219.0
991 957.0
992 819.5
993 787.5
994 513.0
Instead of floats
I would rather have the integers. Rounding the resulted float
values using asType(int)
would be risky since converting to int
would probably introduce the duplicated values from the float
values that are too close to each other such as 3.5
and 4.0
. Those when converted to the integers both would result to the integer value of 4
.
Is there any way to guide rank
method to output the integers?
Upvotes: 5
Views: 3435
Reputation: 518
No need to use method='dense', just convert to an integer.
df['a_rank'] = df['a'].rank().astype(int)
Upvotes: 0
Reputation: 587
The above solution did not work for me. The following did work though. The critical line with edits is:
df['a_rank'] = df['a'].rank(method='dense').astype(int);
This could be a version issue.
Upvotes: 3
Reputation: 394159
Pass param method='dense'
, this will increase the ranks by 1
between groups, see the docs:
In [2]:
numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,))
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank(method='dense')
df
Out[2]:
a a_rank
0 1095 114
1 2514 248
2 500 53
3 6112 592
4 5582 533
5 851 91
6 2887 287
7 3798 366
8 4698 458
9 1699 170
10 4739 462
11 7199 693
12 817 88
13 3801 367
14 5584 534
15 4939 481
16 2569 258
17 6806 656
18 93 8
19 8574 816
20 4107 396
21 7086 684
22 6819 657
23 8844 847
24 170 15
25 6629 634
26 9905 950
27 5312 512
28 3794 365
29 9476 911
.. ... ...
970 4607 447
971 8430 801
972 6527 625
973 2794 280
974 4414 425
975 1069 111
976 2849 285
977 7955 759
978 5767 547
979 7767 742
980 2956 294
981 5847 554
982 1029 107
983 4967 485
984 256 25
985 5577 532
986 6866 662
987 5903 563
988 1785 181
989 749 78
990 2164 212
991 1074 112
992 8752 837
993 2737 272
994 2761 277
995 7355 705
996 8956 857
997 4831 473
998 222 21
999 9531 917
[1000 rows x 2 columns]
Upvotes: 2