Group data by one column and select first occurences from two other columns

Question

I need to .groupby() using customer, and then add a column for the date in which the customer made his/her first purchase, and add another column for the corresponding purchase amount.

Here is my code. I am doing the first part wrong and don't know how to do the second. I've tried .loc and .idxmin ....

mydata = [{'amount': 3200, 'close_date':'2013-03-31', 'customer': 'Customer 1',},
          {'amount': 1430, 'close_date':'2013-11-30', 'customer': 'Customer 1',},
          {'amount': 4320, 'close_date':'2014-03-31', 'customer': 'Customer 2',},
          {'amount': 2340, 'close_date':'2015-05-18', 'customer': 'Customer 2',},
          {'amount': 4320, 'close_date':'2015-06-29', 'customer': 'Customer 2',},]

df = pd.DataFrame(mydata)
df.close_date = pd.to_datetime(df.close_date)
df['first_date'] = df.groupby('customer')['close_date'].min().apply(lambda x: x.strftime('%Y-%m'))

Stefan · Accepted Answer

If you sort your data by close_date, you can do as follows:

df.sort_values('close_date').groupby(['customer'])['close_date', 'amount'].first()

           close_date  amount
customer                     
Customer 1 2013-03-31    3200
Customer 2 2014-03-31    4320

.sort_values() has been added in 0.17, used to be sort() (see docs).

Group data by one column and select first occurences from two other columns

Answers (2)

Related Questions