Jackson Tale
Jackson Tale

Reputation: 25842

rolling_sum on business day and return new dataframe with date as index

I have such a DataFrame:

                     A
2016-01-01 00:00:00  0
2016-01-01 12:00:00  1

2016-01-02 00:00:00  2
2016-01-02 12:00:00  3
2016-01-03 00:00:00  4
2016-01-03 12:00:00  5

2016-01-04 00:00:00  6
2016-01-04 12:00:00  7
2016-01-05 00:00:00  8
2016-01-05 12:00:00  9

The reason I separate 2016-01-02 00:00:00 to 2016-01-03 12:00:00 is that, those two days are weekends.


So here is what I wish to do:

I wish to rolling_sum with window = 2 business days.

For example, I wish to sum

                     A
2016-01-04 00:00:00  6
2016-01-04 12:00:00  7
2016-01-05 00:00:00  8
2016-01-05 12:00:00  9

and then sum (we skip any non-business days)

                     A
2016-01-01 00:00:00  0
2016-01-01 12:00:00  1

2016-01-04 00:00:00  6
2016-01-04 12:00:00  7

And the result is

                     A
2016-01-01           Nan
2016-01-04           14
2016-01-05           30

How can I achieve that?

I tried rolling_sum(df, window=2, freq=BDay(1)), it seems it just pick one row of the same day, but not sum the two rows (00:00 and 12:00) within the same day.

Upvotes: 0

Views: 1862

Answers (1)

Stefan
Stefan

Reputation: 42905

You could first select only business days, resample to (business) daily frequency for the remaining data points and sum, and then apply rolling_sum:

Starting with some sample data:

df = pd.DataFrame(data={'A': np.random.randint(0, 10, 500)}, index=pd.date_range(datetime(2016,1,1), freq='6H', periods=500))

                     A
2016-01-01 00:00:00  6
2016-01-01 06:00:00  9
2016-01-01 12:00:00  3
2016-01-01 18:00:00  9
2016-01-02 00:00:00  7
2016-01-02 06:00:00  5
2016-01-02 12:00:00  8
2016-01-02 18:00:00  6
2016-01-03 00:00:00  2
2016-01-03 06:00:00  0
2016-01-03 12:00:00  0
2016-01-03 18:00:00  0
2016-01-04 00:00:00  5
2016-01-04 06:00:00  4
2016-01-04 12:00:00  1
2016-01-04 18:00:00  4
2016-01-05 00:00:00  6
2016-01-05 06:00:00  9
2016-01-05 12:00:00  7
2016-01-05 18:00:00  2
....

First select the values on business days:

tsdays = df.index.values.astype('<M8[D]')
bdays = pd.bdate_range(tsdays[0], tsdays[-1]).values.astype('<M8[D]')
df = df[np.in1d(tsdays, bdays)]

Then apply rolling_sum() to the resampled data, where each value represents the sum for an individual business day:

pd.rolling_sum(df.resample('B', how='sum'), window=2)

to get:

             A
2016-01-01 NaN
2016-01-04  41
2016-01-05  38
2016-01-06  56
2016-01-07  52
2016-01-08  37

See also [here] for the type conversion and 1[this question]2 for the business day extraction.

Upvotes: 1

Related Questions