Reputation: 513
I have a df like this:
DATE PP
0 2011-12-20 07:00:00 0.0
1 2011-12-20 08:00:00 0.0
2 2011-12-20 09:00:00 2.0
3 2011-12-20 10:00:00 0.0
4 2011-12-20 11:00:00 0.0
5 2011-12-20 12:00:00 0.0
6 2011-12-20 13:00:00 0.0
7 2011-12-20 14:00:00 5.0
8 2011-12-20 15:00:00 0.0
9 2011-12-20 16:00:00 0.0
10 2011-12-20 17:00:00 2.0
11 2011-12-20 18:00:00 0.0
12 2011-12-20 19:00:00 0.0
13 2011-12-20 20:00:00 1.0
14 2011-12-20 21:00:00 0.0
15 2011-12-20 22:00:00 0.0
16 2011-12-20 23:00:00 0.0
17 2011-12-21 00:00:00 0.0
18 2011-12-21 01:00:00 3.0
19 2011-12-21 02:00:00 0.0
20 2011-12-21 03:00:00 0.0
21 2011-12-21 04:00:00 0.0
22 2011-12-21 05:00:00 0.0
23 2011-12-21 06:00:00 5.0
24 2011-12-21 07:00:00 0.0
... .... ... ...
75609 2020-08-05 16:00:00 0.0
75610 2020-08-05 19:00:00 0.0
[75614 rows x 2 columns]
I want the cumulative values of PP
column between 2 specific hourly dates in different days. I want the sum of every 07:00:00 from one day to the 07:00:00 of the next day. For example i want the cumulative values of PP from 2011-12-20 07:00:00 to 2011-12-21 07:00:00:
Expected result:
DATE CUMULATIVE VALUES PP
0 2011-12-20 18
1 2011-12-21 5
2 2011-12-22 10
etc... etc... ...
I tried this:
df['DAY'] = df['DATE'].dt.strftime('%d')
cumulatives=pd.DataFrame(df.groupby(['DAY'])['PP'].sum())
But this only sums the entire day, not between 7:00:00 to 7:00:00 of days.
Data:
{'DATE': ['2011-12-20 07:00:00', '2011-12-20 08:00:00', '2011-12-20 09:00:00',
'2011-12-20 10:00:00', '2011-12-20 11:00:00', '2011-12-20 12:00:00',
'2011-12-20 13:00:00', '2011-12-20 14:00:00', '2011-12-20 15:00:00',
'2011-12-20 16:00:00', '2011-12-20 17:00:00', '2011-12-20 18:00:00',
'2011-12-20 19:00:00', '2011-12-20 20:00:00', '2011-12-20 21:00:00',
'2011-12-20 22:00:00', '2011-12-20 23:00:00', '2011-12-21 00:00:00',
'2011-12-21 01:00:00', '2011-12-21 02:00:00', '2011-12-21 03:00:00',
'2011-12-21 04:00:00', '2011-12-21 05:00:00', '2011-12-21 06:00:00',
'2011-12-21 07:00:00', '2020-08-05 16:00:00', '2020-08-05 19:00:00'],
'PP': [0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0,
0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0]}
Upvotes: 0
Views: 69
Reputation:
One way is to subtract 7hours from date so that each day starts at 17:00 of the previous day; then groupby.sum
fetches the desired output:
df['DATE'] = pd.to_datetime(df['DATE'])
out = df.groupby(df['DATE'].sub(pd.to_timedelta('7h')).dt.date)['PP'].sum().reset_index(name='SUM')
Output:
DATE SUM
0 2011-12-20 18.0
1 2011-12-21 0.0
2 2020-08-05 0.0
Upvotes: 2