Reputation: 460
I have a dataframe like below:
df.head(25)
ORDER_ID MILESTONE
19837715 00
19837715 00
19837715 M0
19837715 M2
19837715 M1
19841303 00
19841303 M0
19841303 00
19841303 M0
19841303 M2
19841303 M1
19841303 M3
19841333 M3
19841333 M4
19841333 M4
19841333 M7
19841333 M5
19841333 M6
19841333 M2
I am trying re-write the below query in pandas
select
order_id
, max(milestone) as current_milestone
from
df
group by
order_id
Level of hirerachy categorical column. goes from 0,M0,M1,M2,M3,M4,M5,M6,M7 where M7 is the highest.
How can this be done in pandas?
Upvotes: 0
Views: 40
Reputation: 153480
IIUC, let's try:
df['MILESTONE'] = pd.Categorical(df['MILESTONE'],
categories=sorted(df['MILESTONE'].unique(), key=lambda x: x[-1]),
ordered=True)
df.sort_values('MILESTONE', ascending=False).groupby('ORDER_ID').head(1)
Output:
ORDER_ID MILESTONE
15 19841333 M7
11 19841303 M3
3 19837715 M2
Upvotes: 3