Reputation: 869
I have a list of values with datetimes:
Datetime Val
[[2017-01-01 15:00:00, 2],
[2017-02-05 19:00:00, 3],
[2018-04-22 15:00:00, 6],
[2018-08-02 13:00:00, 3],
[2018-10-03 12:00:00, 3]]
I want to group values into N number of equally spaced bins by datetime and then get a list of the cumprod of vals for each group, if a group bin is empty, the cumprod is 1.
My current approach is calculating the first and last timestamp, then using linspace to calculate the equally spaced datetime bins, this is where I'm stuck:
n = 5 # 5 equally sized bins
start = pd.Timestamp(df.iloc[0]['datetime'])
end = pd.Timestamp(df.iloc[-1]['datetime'])
bins = np.linspace(start.value, end.value, n+1) # n+1 as linspace is right bound including
groups = pd.to_datetime(bins).values
Returns:
['2017-01-01T15:00:00.000000000' '2017-05-09T14:24:00.000000000'
'2017-09-14T13:48:00.000000000' '2018-01-20T13:12:00.000000000'
'2018-05-28T12:36:00.000000000' '2018-10-03T12:00:00.000000000']
Output with 5 equally spaced bins and the above given example values could be for example:
output = [2*3, 1, 1, 6, 3*3] # 1 if there is no "Val" for a bin
Is there any efficient/clean way to solve this? I have looked into pd.Grouper but I can't get the freq value to work to output equally spaced datetime groups. Another solution I tried is turning datetimes into epochs, and then using np.digitize to categorize by bins. But this also didn't work out. Appreciate any help, Numpy solutions also welcome.
Upvotes: 2
Views: 603
Reputation: 59569
You can use pd.cut
to specify your bins easily. Then you need groupby
+ prod
.
df.groupby(pd.cut(df.Datetime, bins=5, right=False)).Val.prod()
Datetime
[2017-01-01 15:00:00, 2017-05-09 14:24:00) 6
[2017-05-09 14:24:00, 2017-09-14 13:48:00) 1
[2017-09-14 13:48:00, 2018-01-20 13:12:00) 1
[2018-01-20 13:12:00, 2018-05-28 12:36:00) 6
[2018-05-28 12:36:00, 2018-10-04 03:21:25.200000) 9
Name: Val, dtype: int64
We automatically get your desired behavior of missing groups being filled with 1 becasuse with prod
, empty Series
and ndarrays
multiply to 1.
import numpy as np
np.prod(pd.Series())
#1.0
np.prod(np.ndarray(shape=0))
#1.0
Upvotes: 2