Pandas Groupby Dates, then Cumprod of Group?

Question

I have a list of values with datetimes:

     Datetime         Val 
[[2017-01-01 15:00:00, 2],
 [2017-02-05 19:00:00, 3],
 [2018-04-22 15:00:00, 6],
 [2018-08-02 13:00:00, 3],
 [2018-10-03 12:00:00, 3]]

I want to group values into N number of equally spaced bins by datetime and then get a list of the cumprod of vals for each group, if a group bin is empty, the cumprod is 1.

My current approach is calculating the first and last timestamp, then using linspace to calculate the equally spaced datetime bins, this is where I'm stuck:

n = 5 # 5 equally sized bins
start = pd.Timestamp(df.iloc[0]['datetime'])
end = pd.Timestamp(df.iloc[-1]['datetime'])
bins = np.linspace(start.value, end.value, n+1) # n+1 as linspace is right bound including
groups = pd.to_datetime(bins).values

Returns:

 ['2017-01-01T15:00:00.000000000' '2017-05-09T14:24:00.000000000'
 '2017-09-14T13:48:00.000000000' '2018-01-20T13:12:00.000000000'
 '2018-05-28T12:36:00.000000000' '2018-10-03T12:00:00.000000000']

Output with 5 equally spaced bins and the above given example values could be for example:

 output = [2*3, 1, 1, 6, 3*3] # 1 if there is no "Val" for a bin

Is there any efficient/clean way to solve this? I have looked into pd.Grouper but I can't get the freq value to work to output equally spaced datetime groups. Another solution I tried is turning datetimes into epochs, and then using np.digitize to categorize by bins. But this also didn't work out. Appreciate any help, Numpy solutions also welcome.

ALollz · Accepted Answer

You can use pd.cut to specify your bins easily. Then you need groupby + prod.

df.groupby(pd.cut(df.Datetime, bins=5, right=False)).Val.prod()

Output:

Datetime
[2017-01-01 15:00:00, 2017-05-09 14:24:00)           6
[2017-05-09 14:24:00, 2017-09-14 13:48:00)           1
[2017-09-14 13:48:00, 2018-01-20 13:12:00)           1
[2018-01-20 13:12:00, 2018-05-28 12:36:00)           6
[2018-05-28 12:36:00, 2018-10-04 03:21:25.200000)    9
Name: Val, dtype: int64

We automatically get your desired behavior of missing groups being filled with 1 becasuse with prod, empty Series and ndarrays multiply to 1.

import numpy as np

np.prod(pd.Series())
#1.0

np.prod(np.ndarray(shape=0))
#1.0

Pandas Groupby Dates, then Cumprod of Group?

Answers (1)

Output:

Related Questions