steve
steve

Reputation: 3

Limit cumsum to only the previous 4 rows

beginner's question:

I want to create a cumulative sum column on my dataframe, but I only want the column to add the values from the previous 4 rows (inclusive of the current row). I also need to start the count again with each new 'Type' in the frame.

This is what I'm going for:

Type    Value    Desired column
  A       1        -
  A       2        -
  A       1        -
  A       1        5
  A       2        6
  A       2        6
  B       2        -
  B       2        -
  B       2        -
  B       2        8
  B       1        7
  B       1        6

Upvotes: 0

Views: 731

Answers (1)

DSM
DSM

Reputation: 353604

You can do this by applying a rolling_sum after we groupby the Type. For example:

>>> df["sum4"] = df.groupby("Type")["Value"].apply(lambda x: pd.rolling_sum(x,4))
>>> df
   Type  Value  sum4
0     A      1   NaN
1     A      2   NaN
2     A      1   NaN
3     A      1     5
4     A      2     6
5     A      2     6
6     B      2   NaN
7     B      2   NaN
8     B      2   NaN
9     B      2     8
10    B      1     7
11    B      1     6

pandas uses NaN to represent missing data; if you really want - instead, you could do that too, using

df["sum4"] = df["sum4"].fillna('-')

Upvotes: 4

Related Questions