Reputation: 53

How to calculate cumulative weekly sum of values if I have daily values, using Pandas?

I am a novice user of Pandas. I have a dataframe that looks like this:

days rainfall
1    3.51
2    1.32
3    0
4    0
5    0
6    0
7    0
8    0
9    0.03
10   0
11   0
12   0.17
13   0.23
14   0.02
15   0
16   0
17   0
18   0.03
19   0.02
20   0
21   0

I would like to add a column (let's call it 'cumulative') that shows the cumulative rainfall values for every week. In other words, I want to calculate the cumulative values for the first seven days (1-7), then the second set of seven days (8-14), and so on.

The end product would look like this:

days rainfall cumulative
1    3.51     4.83
2    1.32     0.45
3    0        0.05
4    0
5    0
6    0
7    0
8    0
9    0.03
10   0
11   0
12   0.17
13   0.23
14   0.02
15   0
16   0
17   0
18   0.03
19   0.02
20   0
21   0

So far I've tried calling rolling with sum but I do not get what I want.

df['cumulative']=df['rainfall'].rolling(min_periods=7, window=7).sum()

Grateful for any tips or advice!

Upvotes: 2

Answers (3)

Erfan

Reputation: 42916

If I understand you correctly you want GroupBy.transform:

# create groups of each 7 days with floordivision
grps = df['days'].sub(1).floordiv(7)

# get the cumulative sum per group
df['cumsum'] = df.groupby(grps)['rainfall'].transform('sum')

    days  rainfall  cumsum
0      1      3.51    4.83
1      2      1.32    4.83
2      3      0.00    4.83
3      4      0.00    4.83
4      5      0.00    4.83
5      6      0.00    4.83
6      7      0.00    4.83
7      8      0.00    0.45
8      9      0.03    0.45
9     10      0.00    0.45
10    11      0.00    0.45
11    12      0.17    0.45
12    13      0.23    0.45
13    14      0.02    0.45
14    15      0.00    0.05
15    16      0.00    0.05
16    17      0.00    0.05
17    18      0.03    0.05
18    19      0.02    0.05
19    20      0.00    0.05
20    21      0.00    0.05

Upvotes: 1

Ray Johns

Reputation: 808

EDIT: Another method that works without DateTime indices is pd.cut().

    df.groupby(pd.cut(df.days, bins=3, 
        precision=0))["rainfall"].sum()

    days
    (1.0, 8.0]      4.83
    (8.0, 14.0]     0.45
    (14.0, 21.0]    0.05

The cut method allows you to specify a frequency range to bin values.

    pd.cut(df.days, bins=3)

is a way of saying "take the Series df["days"] and split it into three chunks". If you run that code alone, you see:

    0       (1.0, 8.0]
    1       (1.0, 8.0]
    2       (1.0, 8.0]
    .
    .
    .
    19    (14.0, 21.0]
    20    (14.0, 21.0]

It's labeling each row in your DataFrame with what bin it belongs in. You can then use that as an argument in a groupby statement, just like any other column attribute, and apply an aggregate function.

Putting ["rainfall"] outside the groupby statement is a way of saying, "this is the column I want the sum of" (i.e., don't sum the days). You could alternately write it first, if that's more intuitive. (It's great, and also frustrating, that pandas has a lot more than one and only one right way to do things.)

df["rainfall"].groupby(...)

ORIGINAL ANSWER:

For aggregate statistics, you can use pd.resample(). It's a DateTime index method (I had to coerce it a bit here, but usually you'll have more to go on with weather timestamps).

    df.resample("W").sum()["rainfall"]

is the code to downsample days to weeks and aggregate values.

In this case, I constructed a DataFrame from a dictionary and cast the index to DateTime format to use the resample method:

    df = pd.DataFrame( 
        data={
            "days": (list(range(1,22))), 
            "rainfall": [3.51,
                1.32, 0, 0, 0, 0, 0, 0, 0.03, 
                0, 0, 0.17, 0.23, 0.02, 0, 0,  
                0, 0.03, 0.02, 0, 0]}, 
             index=pd.to_datetime(list(range(1,22)), format="%d",
             errors="coerce"))

That gets you:

    1900-01-07    4.83
    1900-01-14    0.45
    1900-01-21    0.05
    Freq: W-SUN, Name: rainfall, dtype: float64

Again, you'd want to adjust the year and month as appropriate, but the nice thing about resample is that you can easily aggregate by predefined time intervals (week, days, minutes, etc.) and custom spans.

Upvotes: 0

javidcf

Reputation: 59701

You can do that like this:

import pandas as pd

df = pd.DataFrame([
    [ 1, 3.51],
    [ 2, 1.32],
    [ 3, 0],
    [ 4, 0],
    [ 5, 0],
    [ 6, 0],
    [ 7, 0],
    [ 8, 0],
    [9, 0.03],
    [10, 0],
    [11, 0],
    [12, 0.17],
    [13, 0.23],
    [14, 0.02],
    [15, 0],
    [16, 0],
    [17, 0],
    [18, 0.03],
    [19, 0.02],
    [20, 0],
    [21, 0]], columns=['days', 'rainfall'])
result = df['rainfall'].groupby((df['days'] - 1) // 7).sum().reset_index(drop=True)
print(result)
# In [418]: %paste -q
# 0    4.83
# 1    0.45
# 2    0.05
# Name: rainfall, dtype: float64

Upvotes: 1

How to calculate cumulative weekly sum of values if I have daily values, using Pandas?

Answers (3)

Related Questions