Avi
Avi

Reputation: 1845

How do I get a column value based on rolling sum of another column?

I want to create a data set and then determine the value of 2nd column based on the numbers inserted in 1st column.

First I would like to create a data set having 2 columns (keeping one column blank)

Data set I want to create: (In numbers column I want to insert all the consecutive numbers from 1 to 50)

 Expected Output: 

  Numbers      Sum  
    1          55   (this is the sum of first 10 consecutive numbers)
    2          65   (this is the sum of consecutive 10 numbers from 2 to 11)
    3          75   (this is the sum of consecutive 10 numbers from 3 to 12)
    4          85   (this is the sum of consecutive 10 numbers from 4 to 13)
    5
    6
    7
    8 
    9 
    10 
    11 
    12
    13

and so on and I want to calculate the sum for the values from 1 to 50 by inserting values from 1 to 50 in column numbers and get the sum of the 10 consecutive numbers.

Upvotes: 0

Views: 252

Answers (1)

jezrael
jezrael

Reputation: 863301

It is possible, but then get first or last NaNs with rolling:

df['a'] = df['Numbers'].rolling(10).sum()
print (df)
    Numbers     a
0         1   NaN
1         2   NaN
2         3   NaN
3         4   NaN
4         5   NaN
5         6   NaN
6         7   NaN
7         8   NaN
8         9   NaN
9        10  55.0
10       11  65.0
11       12  75.0
12       13  85.0

df['a'] = df['Numbers'].rolling(10).sum().shift(-9)
print (df)
    Numbers     a
0         1  55.0
1         2  65.0
2         3  75.0
3         4  85.0
4         5   NaN
5         6   NaN
6         7   NaN
7         8   NaN
8         9   NaN
9        10   NaN
10       11   NaN
11       12   NaN
12       13   NaN

Or use parameter min_periods=1 for count with intervals 1, 1-2, 1-3 ... for fill first NaNs:

df['a'] = df['Numbers'].rolling(10, min_periods=1).sum()
print (df)
    Numbers     a
0         1   1.0
1         2   3.0
2         3   6.0
3         4  10.0
4         5  15.0
5         6  21.0
6         7  28.0
7         8  36.0
8         9  45.0
9        10  55.0
10       11  65.0
11       12  75.0
12       13  85.0

EDIT:

If is possible add next values - here range 1 to 61 then is possible create rolling and then remove last NaNs rows:

df = pd.DataFrame({'Numbers':range(1, 61)})

df['a'] = df['Numbers'].rolling(10).sum().shift(-9)
df = df.iloc[:50]

print (df)
    Numbers      a
0         1   55.0
1         2   65.0
2         3   75.0
3         4   85.0
4         5   95.0
5         6  105.0
6         7  115.0
7         8  125.0
8         9  135.0
9        10  145.0
10       11  155.0
11       12  165.0
12       13  175.0
13       14  185.0
14       15  195.0
15       16  205.0
16       17  215.0
17       18  225.0
18       19  235.0
19       20  245.0
20       21  255.0
21       22  265.0
22       23  275.0
23       24  285.0
24       25  295.0
25       26  305.0
26       27  315.0
27       28  325.0
28       29  335.0
29       30  345.0
30       31  355.0
31       32  365.0
32       33  375.0
33       34  385.0
34       35  395.0
35       36  405.0
36       37  415.0
37       38  425.0
38       39  435.0
39       40  445.0
40       41  455.0
41       42  465.0
42       43  475.0
43       44  485.0
44       45  495.0
45       46  505.0
46       47  515.0
47       48  525.0
48       49  535.0
49       50  545.0

Upvotes: 2

Related Questions