Rank multiple columns in pandas

Question

I have a dataset of a series with missing values that I want to replace by the index. The second column contains the same numbers than the first column, but in a different order.

here's an example:

>>> df
ind    u    v   d
0      5    7   151
1      7    20  151
2      8    40  151
3      20   5   151

this should turn out to:

>>>df
ind    u    v   d
0      1    2   151
1      2    4   151
2      3    5   151
3      4    1   151

i reindexed the values in row 'u' by creating a new column:

>>>df['new_index'] = range(1, len(numbers) + 1)

but how do I now replace values of the second column referring to the indexes?

Thanks for any advice!

jezrael · Accepted Answer

You can use Series.rank, but first need create Series with unstack and last create DataFrame with unstack again:

df[['u','v']] = df[['u','v']].unstack().rank(method='dense').astype(int).unstack(0)
print (df)
     u  v    d
ind           
0    1  2  151
1    2  4  151
2    3  5  151
3    4  1  151

If use only DataFrame.rank, output in v is different:

df[['u','v']] = df[['u','v']].rank(method='dense').astype(int)
print (df)
     u  v    d
ind           
0    1  2  151
1    2  3  151
2    3  4  151
3    4  1  151

Rank multiple columns in pandas

Answers (1)

Related Questions