How to do a groupby follow by order by and then pick the first row in python/pandas

Question

I have a dataframe and I want to group by two columns and then order by a third column and then pick the first row from each group. This is the code I'm using

first= df.groupby(['EMPLID','EMPL_RCD']).apply(lambda x: x.sort_values(by = ['EFFDT','EFFSEQ'], ascending = True)).first()

but I'm getting the following error when running it

first() missing 1 required positional argument: 'offset'

What is missing here?

jpp · Accepted Answer

You can sort_values and then drop_duplicates:

res = df.sort_values(['EFFDT','EFFSEQ'])\
        .drop_duplicates(subset=['EMPLID','EMPL_RCD'])

Alternatively, you can sort and then use groupby + first:

res = df.sort_values(['EFFDT','EFFSEQ'])\
        .groupby(['EMPLID','EMPL_RCD']).first()

Your code does not work because you are applying first to a dataframe rather than a GroupBy object. You need to pass first to groupby as an aggregating function.

How to do a groupby follow by order by and then pick the first row in python/pandas

Answers (1)

Related Questions