Tommy Do
Tommy Do

Reputation: 65

Pandas - rolling sum with missing date

My data is like this

group_name   group_date    value
A            2021-01-01     1
A            2021-01-02     5
A            2021-01-03     3
A            2021-01-04     1.5
A            2021-01-05     3
B            2021-01-01     1
B            2021-01-04     2
B            2021-01-05     9

What I want is to aggregate the value of each group (group_name + group_date) during the past 2 days (today and yesterday), and show the missing dates as well. The first value of each group can be NaN or itself, it does not really matter. Below is the ideal result:

group_name   group_date    value
A            2021-01-01     NaN
A            2021-01-02     6
A            2021-01-03     8
A            2021-01-04     4.5
A            2021-01-05     4.5
B            2021-01-01     NaN
B            2021-01-02     1
B            2021-01-03     0
B            2021-01-04     2
B            2021-01-05     11

I wonder whether there is anyway to do this efficiently in Pandas

Upvotes: 2

Views: 757

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Try asfreq and rolling:

# this assumes that `group_date` is `datetime` type
(df.set_index('group_date')
   .groupby('group_name')['value']
   .apply(lambda x: x.asfreq('1D').rolling('2D').sum())
   .reset_index()
)

Output:

  group_name group_date  value
0          A 2021-01-01    1.0
1          A 2021-01-02    6.0
2          A 2021-01-03    8.0
3          A 2021-01-04    4.5
4          A 2021-01-05    4.5
5          B 2021-01-01    1.0
6          B 2021-01-02    1.0
7          B 2021-01-03    NaN
8          B 2021-01-04    2.0
9          B 2021-01-05   11.0

Upvotes: 3

Related Questions