aspiring1
aspiring1

Reputation: 446

Using numpy.max/ numpy.min for timestamp values

I have a sales table with custid, transaction date column, etc. I am using groupby on the custid column, and then using the agg method to get the max date( to get the latest date of transaction by that particular customer) and min date ( to get the first date of his transaction at the shop).

My code is as below:

sales['transdate'] = pd.to_datetime(sales['transdate']) # Converting the transdate column from string to timestamps.
sales['custid'].groupby.transdate({'count': np.count_nonzero ,'first': np.min, 'last' : np.max})

I would like to know whether its okay to

calculate min and max between dates by using np.min/max methods. or should I be using some other datetime related methods?

Upvotes: 1

Views: 2399

Answers (1)

jpp
jpp

Reputation: 164713

You should use groupby.agg to apply multiple aggregation functions.

Note also that with Pandas many aggregation functions can be called via strings. In this case, you can use 'size', 'min' and 'max'. Using strings is recommended since the string representation is mapped by Pandas to tested and efficient algorithms.

Here's a demo:

df = pd.DataFrame([['2017-01-14', 1], ['2017-12-05', 2], ['2017-06-15', 2],
                   ['2017-03-21', 1], ['2017-04-25', 2], ['2017-02-12', 1]],
                  columns=['transdate', 'custid'])

df['transdate'] = pd.to_datetime(df['transdate'])

agg_dict = {'count': 'size', 'first': 'min', 'last': 'max'}

res = df.groupby('custid')['transdate'].agg(agg_dict)

print(res)

        count      first       last
custid                             
1           3 2017-01-14 2017-03-21
2           3 2017-04-25 2017-12-05

Upvotes: 1

Related Questions