DarkZero
DarkZero

Reputation: 2334

How to calculate the rolling sum on custom time columns?

The rolling function in Pandas can only calculate rolling statistics according to row counts or date/time columns. But I want to have a discrete time column for calculating rolling sum, something like this:

key time value
A   1    10
A   2    20
A   4    30
A   7    10
B   1    15
B   2    30
B   3    15

I want to first group by key, then calculate the rolling sum on value for the nearest 3 time:

key time value output
A   1    10    10
A   2    20    30(10+20)
A   4    30    60(10+20+30)
A   7    10    40(30+10)
B   1    15    15
B   2    30    45
B   3    15    60

I tried this:

grouped = input.groupby("key", as_index=False)
for name, group in grouped:
    group = group.sort_values("time")
    time = list(group["time"])
    value = list(group["value"])

    #calcRollingStat is a custom function that outputs a list of corresponding results
    out = calcRollingStat(time, value, mode="avg") 
    group["output"] = out #out is a list

But then I don't know how to convert grouped back to DataFrame. Pandas tells me that there is no reset_index attribute in grouped.

Is my code the best method to do this? How would you tackle this problem?

Thank you!

Upvotes: 0

Views: 120

Answers (1)

jezrael
jezrael

Reputation: 862511

I believe you can use GroupBy.apply with custom function:

def f(group):
    group = group.sort_values("time")
    time = list(group["time"])
    value = list(group["value"])

    #calcRollingStat is a custom function that outputs a list of corresponding results
    group["output"] = calcRollingStat(time, value, mode="avg") 
    return group


df = input.groupby("key", as_index=False).apply(f)

Upvotes: 1

Related Questions