Reputation: 3148

Pandas DENSE RANK

I'm dealing with pandas dataframe and have a frame like this:

I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:

    Year Value Rank
    2012  10    1
    2013  20    2
    2013  25    2
    2014  30    3

How can it be done in pandas?

Thanks!

Upvotes: 15

Answers (4)

piRSquared

Reputation: 294348

Use pd.Series.rank with method='dense'

df['Rank'] = df.Year.rank(method='dense').astype(int)

df

Upvotes: 32

ALollz

Reputation: 59549

`Groupby.ngroup`

Will sort keys by default so smaller years get labeled lower. Can set sort=False to rank groups based on order of occurrence.

df['Rank'] = df.groupby('Year', sort=True).ngroup()+1

`np.unique`

Also sorts, so use return_inverse to rank the smaller values lowest.

df['Rank'] = np.unique(df['Year'], return_inverse=True)[1]+1

Upvotes: 3

jezrael

Reputation: 862841

The fastest solution is factorize:

df['Rank'] = pd.factorize(df.Year)[0] + 1

Timings:

#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)

In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop

In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop

In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 µs per loop

Upvotes: 12

Alexander

Reputation: 109546

You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).

df['Rank'] = df.Year.astype('category').cat.codes + 1

>>> df
   Year  Value  Rank
0  2012     10     1
1  2013     20     2
2  2013     25     2
3  2014     30     3

Upvotes: 8

Pandas DENSE RANK

Answers (4)

Groupby.ngroup

np.unique

Related Questions

`Groupby.ngroup`

`np.unique`