Reputation: 6465
I have a dataframe and I want to group by two columns and then order by a third column and then pick the first row from each group. This is the code I'm using
first= df.groupby(['EMPLID','EMPL_RCD']).apply(lambda x: x.sort_values(by = ['EFFDT','EFFSEQ'], ascending = True)).first()
but I'm getting the following error when running it
first() missing 1 required positional argument: 'offset'
What is missing here?
Upvotes: 1
Views: 3744
Reputation: 164623
You can sort_values
and then drop_duplicates
:
res = df.sort_values(['EFFDT','EFFSEQ'])\
.drop_duplicates(subset=['EMPLID','EMPL_RCD'])
Alternatively, you can sort and then use groupby
+ first
:
res = df.sort_values(['EFFDT','EFFSEQ'])\
.groupby(['EMPLID','EMPL_RCD']).first()
Your code does not work because you are applying first
to a dataframe rather than a GroupBy
object. You need to pass first
to groupby
as an aggregating function.
Upvotes: 3