Jossy
Jossy

Reputation: 989

Rolling sum based on all previous dates NOT previous rows sorted by date

Given the following dataframe:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 |     15 |
| 27/05/2019 |     20 |
| 27/05/2019 |     15 |
| 25/06/2019 |     10 |
| 29/06/2019 |     25 |
| 01/07/2019 |     50 |
+------------+--------+

I need to get the rolling sum of all previous dates as follows:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 | NaN    |
| 27/05/2019 | 15     |
| 27/05/2019 | 15     |
| 15/06/2019 | 35     |
| 29/06/2019 | 10     |
| 01/07/2019 | 35     |
+------------+--------+

Using:

df = pd.DataFrame(
    {
        'Date': {
            0: datetime.datetime(2019, 5, 1),
            1: datetime.datetime(2019, 5, 27),
            2: datetime.datetime(2019, 5, 27),
            3: datetime.datetime(2019, 6, 15),
            4: datetime.datetime(2019, 6, 29),
            5: datetime.datetime(2019, 7, 1),
        },
        'Amount': {0: 15, 1: 20, 2: 15, 3: 10, 4: 25, 5: 50}
    }
)
df.sort_values("Date", inplace=True)
df_roll = df.rolling("28d", on="Date", closed="left").sum()

Gets me:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 |    NaN |
| 27/05/2019 |     15 | 
| 27/05/2019 |     35 | <-- Should be 15
| 15/06/2019 |     35 |
| 29/06/2019 |     10 |
| 01/07/2019 |     35 |
+------------+--------+

Which isn't quite correct.

How would I get the sum of all previous dates rather than all previous rows?

Upvotes: 6

Views: 762

Answers (3)

BENY
BENY

Reputation: 323226

You can do

df['new'] = df.Date.map(df.groupby('Date').Amount.sum().rolling("28d", closed="left").sum())
df
        Date  Amount   new
0 2019-05-01      15   NaN
1 2019-05-27      20  15.0
2 2019-05-27      15  15.0
3 2019-06-15      10  35.0
4 2019-06-29      25  10.0
5 2019-07-01      50  35.0

Upvotes: 2

user17242583
user17242583

Reputation:

You could drop duplicate dates first, then do a rolling sum, then forward fill the resulting NaNs (occasioned by the duplicate removal):

df = df.assign(Amount=df.drop_duplicates(subset=['Date']).rolling("28d", on="Date", closed="left")['Amount'].sum()).ffill()

Output:

>>> df
        Date  Amount
0 2019-05-01     NaN
1 2019-05-27    15.0
2 2019-05-27    15.0
3 2019-06-15    20.0
4 2019-06-29    10.0
5 2019-07-01    35.0

Upvotes: 1

cookesd
cookesd

Reputation: 1336

One way is to aggregate your amounts by date first, then compute the rolling sum, and join this sum to the original list of dates to apply the rolling sum to all dates

# Aggregate (sum) by date
df_agged = (df.groupby('Date')['Amount'].agg(['sum'])
            .reset_index()
            .rename(columns={'sum':'Amount'}))
# Compute rolling sum
df_agged_rolling = df_agged.rolling("28d",on="Date",closed='left').sum()

# Join on original dates to apply rolling sum to duplicate dates
df_with_rolling_agg = df.join(df_agged_rolling.set_index('Date'),on='Date',
                              lsuffix='_orig',rsuffix='_rolling_sum')
df_with_rolling_agg

#         Date  Amount_orig  Amount_rolling_sum
# 0 2019-05-01           15                 NaN
# 1 2019-05-27           20                15.0
# 2 2019-05-27           15                15.0
# 3 2019-06-15           10                35.0
# 4 2019-06-29           25                10.0
# 5 2019-07-01           50                35.0

Upvotes: 1

Related Questions