Reputation: 47
I have a dataframe "df_orders_raw" like this:
ORD_KEY ORD_DT_KEY ORD_TM_KEY SKU_KEY QTY
933915294 7598 13 401550750 1
933915294 7598 13 409868344 1
933915294 7598 13 428852481 1
933919765 7598 13 432771563 2
933892844 7598 24 429377565 1
933892844 7598 24 413100006 1
933892844 7598 24 433950159 1
933908232 7598 38 427905127 1
933882107 7598 40 415639133 1
933882107 7598 40 428839526 1
933918543 7598 44 429503890 1
933918543 7598 44 429501262 1
I want to add an ORD_RANK column to this frame ranking data by ORD_DT_KEY, ORD_TM_KEY, ORD_KEY meaning, data should be grouped by ORD_DT_KEY first, and then ORD_TM_KEY will break first level ties followed by ORD_KEY.
Resulting ranks should look as below:
ORD_KEY ORD_DT_KEY ORD_TM_KEY SKU_KEY QTY ORD_RANK
933915294 7598 13 401550750 1 1
933915294 7598 13 409868344 1 1
933915294 7598 13 428852481 1 1
933919765 7598 13 432771563 2 2
933892844 7598 24 429377565 1 3
933892844 7598 24 413100006 1 3
933892844 7598 24 433950159 1 3
933908232 7598 38 427905127 1 4
933882107 7598 40 415639133 1 5
933882107 7598 40 428839526 1 5
933918543 7598 44 429503890 1 6
933918543 7598 44 429501262 1 6
After reading through a bunch of similar QnAs, my current code looks as below:
df_orders = df_orders_raw.copy()
col1 = df_orders['ORD_DT_KEY'].astype(str)
col2 = df_orders['ORD_TM_KEY'].astype(str)
col3 = df_orders['ORD_KEY'].astype(str)
key = col1+col2+col3
df_orders['ORD_RANK'] = (key).astype(float).rank(method='dense').astype(int)
This code works for most cases but starts breaking when key becomes very large (10^17 or larger) since it starts rounding up float numbers. It worked in Python 2.7 since I was using "long" instead of "float", but that's not supported in Python 3.5. If I use (key).astype(int), it throws an OverflowError. After reading similar questions, people suggested using float but that gives me another issue. How can I modify code to make it work for all cases?
Upvotes: 0
Views: 173
Reputation: 323376
Using ngroup
df['ORD_RANK']=df.groupby(['ORD_DT_KEY','ORD_TM_KEY','ORD_KEY']).ngroup()+1
df
Out[1010]:
ORD_KEY ORD_DT_KEY ORD_TM_KEY SKU_KEY QTY ORD_RANK
0 933915294 7598 13 401550750 1 1
1 933915294 7598 13 409868344 1 1
2 933915294 7598 13 428852481 1 1
3 933919765 7598 13 432771563 2 2
4 933892844 7598 24 429377565 1 3
5 933892844 7598 24 413100006 1 3
6 933892844 7598 24 433950159 1 3
7 933908232 7598 38 427905127 1 4
8 933882107 7598 40 415639133 1 5
9 933882107 7598 40 428839526 1 5
10 933918543 7598 44 429503890 1 6
11 933918543 7598 44 429501262 1 6
Upvotes: 2