Reputation: 446
I have a sales table with custid, transaction date column, etc. I am using groupby on the custid column, and then using the agg method to get the max date( to get the latest date of transaction by that particular customer) and min date ( to get the first date of his transaction at the shop).
My code is as below:
sales['transdate'] = pd.to_datetime(sales['transdate']) # Converting the transdate column from string to timestamps.
sales['custid'].groupby.transdate({'count': np.count_nonzero ,'first': np.min, 'last' : np.max})
I would like to know whether its okay to
calculate min and max between dates by using np.min/max methods. or should I be using some other datetime related methods?
Upvotes: 1
Views: 2399
Reputation: 164713
You should use groupby.agg
to apply multiple aggregation functions.
Note also that with Pandas many aggregation functions can be called via strings. In this case, you can use 'size'
, 'min'
and 'max'
. Using strings is recommended since the string representation is mapped by Pandas to tested and efficient algorithms.
Here's a demo:
df = pd.DataFrame([['2017-01-14', 1], ['2017-12-05', 2], ['2017-06-15', 2],
['2017-03-21', 1], ['2017-04-25', 2], ['2017-02-12', 1]],
columns=['transdate', 'custid'])
df['transdate'] = pd.to_datetime(df['transdate'])
agg_dict = {'count': 'size', 'first': 'min', 'last': 'max'}
res = df.groupby('custid')['transdate'].agg(agg_dict)
print(res)
count first last
custid
1 3 2017-01-14 2017-03-21
2 3 2017-04-25 2017-12-05
Upvotes: 1