user12625679
user12625679

Reputation: 696

Find first occure in pandas df for a given column

For each customer_id and partition_date, I would like to find the first date a given customer_id appears in the dataset. This is what I've tried so far, but I've been getting the error below:

df['first_seen_date'] = df.sort_values('partition_date').groupby('customer_id').first()

error:  Wrong number of items passed 3, placement implies 1

df

customer_id  partition_date  
24242        01.01.2020
24242        02.01.2020
24242        04.01.2020
35439        06.01.2020
35439        05.01.2020
35439        07.01.2020

desired output df

customer_id  first_seen_date
24242        01.01.2020
35439        05.01.2020

Upvotes: 1

Views: 32

Answers (1)

jezrael
jezrael

Reputation: 862511

You are close, assign to new DataFrame:

df1 = df.sort_values('partition_date').groupby('customer_id', as_index=False).first()

Or use DataFrame.drop_duplicates:

df1 = df.sort_values('partition_date').drop_duplicates('customer_id')

Upvotes: 3

Related Questions