vivek
vivek

Reputation: 29

Selecting specific rows in DF based on 2 columns in python pandas

I have data from excel loaded into a pandas dataframe. I now want to select only those rows whose ASSESSMENT ID is the max ASSESSMENT ID per APPID and for all the UI SEQ NUMBERS for that APPID.

APPID   APPNAME ASSESSMENT ID   UI SEQ NUMBER   QUESTION    ANSWER TEXT .   
1   appname 2493    11  Question    No .   
1   appname 13808   11  Question    Ctry of domicile .   
1   appname 13808   11  Question    Name .   
1   appname 35316   11  Question    Ctry of domicile .       
1   appname 35316   11  Question    Name .   
1   appname 35316   11  Question    Nationality .       
1   appname 2493    12  Question    Corp name .   
1   appname 2493    12  Question    Cr Br Scr .   
1   appname 2493    12  Question    Inc And Assests .   
1   appname 2493    12  Question    Int, Ext Reg Reports .   
1   appname 13808   12  Question    Corp name .   
1   appname 35316   12  Question    Corp name .   
1   appname 2493    13  Question    No .   
1   appname 13808   13  Question    No .   
1   appname 35316   13  Question    No .   
1   appname 2493    14  Question    No .   
1   appname 13808   14  Question    firms Pos .   
1   appname 35316   14  Question    firms Pos .   

And the result would be

APPID   APPNAME ASSESSMENT ID   UI SEQ NUMBER   QUESTION    ANSWER TEXT .   
1   appname 35316   11  Question    Ctry of domicile .   
1   appname 35316   11  Question    Name .   
1   appname 35316   11  Question    Nationality .   
1   appname 35316   12  Question    Corp name .   
1   appname 35316   13  Question    No .   
1   appname 35316   14  Question    firms Pos .   

Upvotes: 1

Views: 40

Answers (1)

jezrael
jezrael

Reputation: 862441

I think you need boolean indexing with mask created by apply:

df1 = df[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].apply(lambda x:x==x.max())]
print (df1)
    APPID  APPNAME  ASSESSMENT ID  UI SEQ NUMBER  QUESTION       ANSWER TEXT.
3       1  appname          35316             11  Question  Ctry of domicile.
4       1  appname          35316             11  Question              Name.
5       1  appname          35316             11  Question       Nationality.
11      1  appname          35316             12  Question         Corp name.
14      1  appname          35316             13  Question                No.
17      1  appname          35316             14  Question         firms Pos.

Or if dont need all duplicated values use idxmax:

df1 = df.loc[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].idxmax()]
print (df1)
    APPID  APPNAME  ASSESSMENT ID  UI SEQ NUMBER  QUESTION       ANSWER TEXT.
3       1  appname          35316             11  Question  Ctry of domicile.
11      1  appname          35316             12  Question         Corp name.
14      1  appname          35316             13  Question                No.
17      1  appname          35316             14  Question         firms Pos.

Upvotes: 1

Related Questions