Reputation: 2334
The rolling function in Pandas can only calculate rolling statistics according to row counts or date/time columns. But I want to have a discrete time column for calculating rolling sum, something like this:
key time value
A 1 10
A 2 20
A 4 30
A 7 10
B 1 15
B 2 30
B 3 15
I want to first group by key
, then calculate the rolling sum on value
for the nearest 3 time
:
key time value output
A 1 10 10
A 2 20 30(10+20)
A 4 30 60(10+20+30)
A 7 10 40(30+10)
B 1 15 15
B 2 30 45
B 3 15 60
I tried this:
grouped = input.groupby("key", as_index=False)
for name, group in grouped:
group = group.sort_values("time")
time = list(group["time"])
value = list(group["value"])
#calcRollingStat is a custom function that outputs a list of corresponding results
out = calcRollingStat(time, value, mode="avg")
group["output"] = out #out is a list
But then I don't know how to convert grouped
back to DataFrame. Pandas tells me that there is no reset_index
attribute in grouped
.
Is my code the best method to do this? How would you tackle this problem?
Thank you!
Upvotes: 0
Views: 120
Reputation: 862511
I believe you can use GroupBy.apply
with custom function:
def f(group):
group = group.sort_values("time")
time = list(group["time"])
value = list(group["value"])
#calcRollingStat is a custom function that outputs a list of corresponding results
group["output"] = calcRollingStat(time, value, mode="avg")
return group
df = input.groupby("key", as_index=False).apply(f)
Upvotes: 1