Reputation: 141
I need to .groupby()
using customer
, and then add a column for the date in which the customer made his/her first purchase, and add another column for the corresponding purchase amount.
Here is my code. I am doing the first part wrong and don't know how to do the second. I've tried .loc
and .idxmin
....
mydata = [{'amount': 3200, 'close_date':'2013-03-31', 'customer': 'Customer 1',},
{'amount': 1430, 'close_date':'2013-11-30', 'customer': 'Customer 1',},
{'amount': 4320, 'close_date':'2014-03-31', 'customer': 'Customer 2',},
{'amount': 2340, 'close_date':'2015-05-18', 'customer': 'Customer 2',},
{'amount': 4320, 'close_date':'2015-06-29', 'customer': 'Customer 2',},]
df = pd.DataFrame(mydata)
df.close_date = pd.to_datetime(df.close_date)
df['first_date'] = df.groupby('customer')['close_date'].min().apply(lambda x: x.strftime('%Y-%m'))
Upvotes: 1
Views: 182
Reputation: 42875
If you sort your data by close_date
, you can do as follows:
df.sort_values('close_date').groupby(['customer'])['close_date', 'amount'].first()
close_date amount
customer
Customer 1 2013-03-31 3200
Customer 2 2014-03-31 4320
.sort_values()
has been added in 0.17
, used to be sort()
(see docs).
Upvotes: 3
Reputation: 28936
Two steps.
First the day of first purchase:
In [34]: first = df.groupby('customer').close_date.min()
In [35]: first
Out[35]:
customer
Customer 1 2013-03-31
Customer 2 2014-03-31
Name: close_date, dtype: object
We'll use first as an indexer,
In [36]: idx = pd.MultiIndex.from_tuples(list(first.iteritems()), names=['customer', 'close_date'])
In [37]: idx
Out[37]:
MultiIndex(levels=[['Customer 1', 'Customer 2'], ['2013-03-31', '2014-03-31']],
labels=[[0, 1], [0, 1]])
For a DataFrame with those two levels
In [38]: df2 = df.set_index(['customer', 'close_date'])
In [39]: df2.loc[idx]
Out[39]:
amount
customer close_date
Customer 1 2013-03-31 3200
Customer 2 2014-03-31 4320
This is a series, you can use .unstack()
to get back a DataFrame.
Upvotes: 0