Jack Armstrong
Jack Armstrong

Reputation: 1239

Variable Running Total in Pandas dataframe

I am trying to create a function in python where the user inputs a column of meaning full values and a number, call it X. The function will then create a running total over X of those meaning full values over.

For example, for a series A, and X = 3:

import random 
mylist = []
for i in range(0,100):
    x = random.randint(1,10)
    mylist.append(x)
a = pd.Series(mylist)
a.head()
Out:
 0    2
 1    5
 2    2
 3    8
 4    8
 5    6

so my_func() would return this new sereis (as the third column):

 0    2   Nan
 1    5   Nan
 2    2   9 # 2+5+2
 3    8   15 # 5+2+8
 4    8   18 # 2+8+8
 5    6   22 # 8+8+6

I thought of using cumsum() in some way, but had trouble wrapping my head around the X part of it. And truthfully I did not know where to start.

Upvotes: 0

Views: 161

Answers (3)

Victor Domingos
Victor Domingos

Reputation: 1093

Not sure if you are looking into a more optimised Pandas or Numpy solution, but for the sake of completeness, I am suggesting here a pure-python version of the function. It implements a generator, so that in many cases it may be a quite simple, yet performant, option, without any external dependencies:


def my_func2(data, x):
    i = 1
    for n in data:
        if i < x:
            yield 0  # ignore the first numbers
        else:       
            yield sum(data[i-x:i]) # sum the current number and previous x-1 numbers.
        i += 1



mylist = [2, 5, 2, 8, 8, 6]

# The enumerate() part is just a way to generate the counting on the left
for i, n in enumerate(my_func2(mylist, 3)):
    print(f"{i:5}: {n:9}")

Upvotes: 0

Mayank Porwal
Mayank Porwal

Reputation: 34086

You can use rolling sum and specify the window(in your case 3). Some like this would work:

In [348]: df                                                                                                                                                                                                
Out[348]: 
0  2
1  5
2  2
3  8
4  8
5  6

In [350]: df.rolling(window=3).sum()                                                                                                                                                                     
Out[350]: 
0     NaN
1     NaN
2     9.0
3    15.0
4    18.0
5    22.0
Name: A, dtype: float64

Upvotes: 2

wwnde
wwnde

Reputation: 26686

Data

df=pd.read_clipboard()
df

enter image description here

Rolling sum

df['runningsum']=df.rolling(3).s.sum()
df

Output

enter image description here

Upvotes: 1

Related Questions