Removing duplicate values base on a dat and a second column keeping the min value

Question

I'm trying to remove duplicate values base on the initial date and the policy number, keeping the first entry on a date. for example if I have two records like:

 policy_number        initial_date
    123456                  01/02/97
    123456                  01/02/99

and the output I want should be look like this.

 policy_number        initial_date
    123456                  01/02/97

till' now I've tried this:

df.drop_duplicates(['POLICY_NUMBER','initial_date'], keep="first", inplace=True)

 df.sort_values('initial_date').drop_duplicates('initial_date',keep='first')

but still not worthit, the output looks same as the original or drop only base on the date ignoring the policy number

fyi both variables are object dtype

ansev · Accepted Answer

Use:

df.sort_values('initial_date').drop_duplicates('policy_number',keep='first')

or

df.sort_values('initial_date').groupby('policy_number',as_index=False).first()

Output

   policy_number initial_date
0         123456     01/02/97

Removing duplicate values base on a dat and a second column keeping the min value

Answers (1)

Related Questions