Reputation: 361
I'm trying to remove duplicate values base on the initial date and the policy number, keeping the first entry on a date. for example if I have two records like:
policy_number initial_date
123456 01/02/97
123456 01/02/99
and the output I want should be look like this.
policy_number initial_date
123456 01/02/97
till' now I've tried this:
df.drop_duplicates(['POLICY_NUMBER','initial_date'], keep="first", inplace=True)
df.sort_values('initial_date').drop_duplicates('initial_date',keep='first')
but still not worthit, the output looks same as the original or drop only base on the date ignoring the policy number
fyi both variables are object dtype
Upvotes: 0
Views: 48
Reputation: 30930
Use:
df.sort_values('initial_date').drop_duplicates('policy_number',keep='first')
or
df.sort_values('initial_date').groupby('policy_number',as_index=False).first()
Output
policy_number initial_date
0 123456 01/02/97
Upvotes: 1