HHH
HHH

Reputation: 6465

How to do a groupby follow by order by and then pick the first row in python/pandas

I have a dataframe and I want to group by two columns and then order by a third column and then pick the first row from each group. This is the code I'm using

first= df.groupby(['EMPLID','EMPL_RCD']).apply(lambda x: x.sort_values(by = ['EFFDT','EFFSEQ'], ascending = True)).first()

but I'm getting the following error when running it

first() missing 1 required positional argument: 'offset'

What is missing here?

Upvotes: 1

Views: 3744

Answers (1)

jpp
jpp

Reputation: 164623

You can sort_values and then drop_duplicates:

res = df.sort_values(['EFFDT','EFFSEQ'])\
        .drop_duplicates(subset=['EMPLID','EMPL_RCD'])

Alternatively, you can sort and then use groupby + first:

res = df.sort_values(['EFFDT','EFFSEQ'])\
        .groupby(['EMPLID','EMPL_RCD']).first()

Your code does not work because you are applying first to a dataframe rather than a GroupBy object. You need to pass first to groupby as an aggregating function.

Upvotes: 3

Related Questions