Crypticlight
Crypticlight

Reputation: 47

Pandas dataframe ranked by multiple columns (combination key)

Is there a pythonic way of sorting the dataframe described below by a combo key - first by ORD_DT_KEY then by ORD_TM_KEY and then by ORD_KEY; and store the rank in another column ORD_RANK?

Code below does not work since it creates 3 separate ranks and throws an error when I try to assign it to single column.

import pandas as pd

#Read orders data stored in a pipe delimeted file
df_raw_orders = pd.read_table("Dummy_order_data.txt", delimiter="|")

print df_raw_orders.head()

df_sorted_orders = df_raw_orders
cols = ['ORD_DT_KEY','ORD_TM_KEY','ORD_KEY']
df_sorted_orders['ORD_RANK'] = df_sorted_orders[cols].rank(method='dense')

Input data, Dummy_order_data.txt, is as below:

ORD_KEY|ORD_DT_KEY|ORD_TM_KEY|QTY|SKU_KEY
10001|1|0|1|1
10001|1|0|1|2
10001|1|0|1|3
10002|2|0|1|1
10002|2|0|1|3
10003|3|0|1|4
10004|4|0|1|4
10004|4|0|1|5
10005|5|0|1|1
10006|6|0|1|1
10006|6|0|1|4
10007|7|0|1|3
10007|7|0|1|4
10008|8|0|1|5
10009|9|0|1|1
10009|9|0|1|4
10009|9|0|1|5
10010|10|0|2|1
10010|10|0|2|2
10010|10|0|2|3
10011|11|0|1|1
10011|11|0|1|3
10012|12|0|1|4
10012|12|0|1|5
10013|13|0|1|1
10014|14|0|2|1
10014|14|0|2|4

Required output:

ORD_KEY ORD_DT_KEY  ORD_TM_KEY  SKU_KEY QTY ORD_RANK
10001   1   0   1   1   1
10001   1   0   2   1   1
10001   1   0   3   1   1
10002   2   0   1   1   2
10002   2   0   3   1   2
10003   3   0   4   1   3
10004   4   0   4   1   4
10004   4   0   5   1   4
10005   5   0   1   1   5
10006   6   0   1   1   6
10006   6   0   4   1   6
10007   7   0   3   1   7
10007   7   0   4   1   7
10008   8   0   5   1   8
10009   9   0   1   1   9
10009   9   0   4   1   9
10009   9   0   5   1   9
10010   10  0   1   2   10
10010   10  0   2   2   10
10010   10  0   3   2   10
10011   11  0   1   1   11
10011   11  0   3   1   11
10012   12  0   4   1   12
10012   12  0   5   1   12
10013   13  0   1   1   13
10014   14  0   1   2   14
10014   14  0   4   2   14

Upvotes: 1

Views: 297

Answers (1)

BENY
BENY

Reputation: 323236

Cause in your left there return 3 columns so you have to assign those back two 3 columns rather than one

for x in cols:
    df[x + 'rank'] = df[x].rank(method='dense') 

Upvotes: 2

Related Questions