Ilya Vishnyakov
Ilya Vishnyakov

Reputation: 69

Daily calculations in intraday data

Let's say I have a DataFrame with date_time index:

date_time             a    b
2020-11-23 04:00:00   10   5
2020-11-23 05:00:00   11   5
2020-11-23 06:00:00   12   5
2020-11-24 04:30:00   13   6
2020-11-24 05:30:00   14   6
2020-11-24 06:30:00   15   6
2020-11-25 06:00:00   16   7
2020-11-25 07:00:00   17   7
2020-11-25 08:00:00   18   7

"a" column is intraday data (every row - different value). "b" column - DAILY data - same data during the current day. I need to make some calculations with "b" (daily) column and create "c" column with the result. For example, sum for two last days.

Result:

date_time             a    b   c
2020-11-23 04:00:00   10   5   NaN
2020-11-23 05:00:00   11   5   NaN
2020-11-23 06:00:00   12   5   NaN
2020-11-24 04:30:00   13   6   11
2020-11-24 05:30:00   14   6   11
2020-11-24 06:30:00   15   6   11
2020-11-25 06:00:00   16   7   13
2020-11-25 07:00:00   17   7   13
2020-11-25 08:00:00   18   7   13

I guesss I should use something like df['c'] = df.resample('D').b.rolling(3).sum ... but I got "NaN" values in "c".

Could you help me? Thanks!

Upvotes: 0

Views: 219

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150815

One thing you can do is to drop duplicates on the date and work on that:

# get the dates
df['date'] = df['date_time'].dt.normalize()

df['c'] = (df.drop_duplicates('date')['b']     # drop duplicates on dates
             .rolling(2).sum()                 # rolling sum
          )
 
df['c'] = df['c'].ffill()                      # fill the missing data

Output:

            date_time   a  b       date     c
0 2020-11-23 04:00:00  10  5 2020-11-23   NaN
1 2020-11-23 05:00:00  11  5 2020-11-23   NaN
2 2020-11-23 06:00:00  12  5 2020-11-23   NaN
3 2020-11-24 04:30:00  13  6 2020-11-24  11.0
4 2020-11-24 05:30:00  14  6 2020-11-24  11.0
5 2020-11-24 06:30:00  15  6 2020-11-24  11.0
6 2020-11-25 06:00:00  16  7 2020-11-25  13.0
7 2020-11-25 07:00:00  17  7 2020-11-25  13.0
8 2020-11-25 08:00:00  18  7 2020-11-25  13.0

Upvotes: 1

Related Questions