How to do groupby maxium row selection in a pandas dataframe for categorical values

Question

I have a dataframe like below:

df.head(25)

    ORDER_ID    MILESTONE
    19837715    00
    19837715    00
    19837715    M0
    19837715    M2
    19837715    M1
    19841303    00
    19841303    M0
    19841303    00
    19841303    M0
    19841303    M2
    19841303    M1
    19841303    M3
    19841333    M3
    19841333    M4
    19841333    M4
    19841333    M7
    19841333    M5
    19841333    M6
    19841333    M2

I am trying re-write the below query in pandas

select
        order_id
        , max(milestone) as current_milestone

    from
        df
    group by
        order_id

Level of hirerachy categorical column. goes from 0,M0,M1,M2,M3,M4,M5,M6,M7 where M7 is the highest.

How can this be done in pandas?

Scott Boston · Accepted Answer

IIUC, let's try:

df['MILESTONE'] = pd.Categorical(df['MILESTONE'], 
               categories=sorted(df['MILESTONE'].unique(), key=lambda x: x[-1]),
               ordered=True)
df.sort_values('MILESTONE', ascending=False).groupby('ORDER_ID').head(1)

Output:

    ORDER_ID MILESTONE
15  19841333        M7
11  19841303        M3
3   19837715        M2

How to do groupby maxium row selection in a pandas dataframe for categorical values

Answers (1)

Related Questions