Reputation: 4113
import pandas as pd
date_index = pd.date_range("2010-01-31", "2010-12-31", freq="M")
df = pd.Series(range(12), index=date_index)
dates = date_index[1::2]
The Series df
is of monthly frequency, and we want to resample by adding up the value between the dates as given by the dates
variable.
df
is:
2010-01-31 0
2010-02-28 1
2010-03-31 2
2010-04-30 3
2010-05-31 4
2010-06-30 5
2010-07-31 6
2010-08-31 7
2010-09-30 8
2010-10-31 9
2010-11-30 10
2010-12-31 11
Freq: M, dtype: int64
dates
is
DatetimeIndex(['2010-02-28', '2010-04-30', '2010-06-30', '2010-08-31',
'2010-10-31', '2010-12-31'],
dtype='datetime64[ns]', freq='2M')
The expected result should be:
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
Upvotes: 2
Views: 174
Reputation: 3711
For your specific example, where df[0] = 0
, it is a simple resample
with sum()
aggregation, skipping df[0]
.
df_resampled = df[1::].resample('2M').sum()
print(df_resampled)
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
Freq: 2M, dtype: int64
In case df[0] != 0
, you can still make an easy workaround by adding df[0]
to the first element of df_resampled
:
df_resampled[0] = df_resampled[0] + df[0]
In case you want general resampling with period of two month, you can try to use the loffset
parameter of resample
and provide a function returning pd.Timedelta
objects such, that it "floors" to the last day of each individual month. (See here for how to get montly periods for pd.Timedelta
)
Upvotes: 1
Reputation: 863156
Idea is replace not matched values of date
s to missing values by Series.where
with bfill
for bacj filling missing values and then aggregate sum
:
date_index = pd.date_range("2010-01-31", "2010-12-31", freq="M")
s = pd.Series(range(12), index=date_index)
dates = date_index[1::2]
a = s.index.to_series().where(s.index.isin(dates)).bfill()
out = s.groupby(a).sum()
print(out)
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
dtype: int64
Upvotes: 1
Reputation: 30609
Not a general resampling solution but for your concrete question of adding up the values between the dates you could use
res = df.cumsum()[dates].diff()
res[0] = df[dates[0]]
res = res.astype(df.dtype)
Result:
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
Upvotes: 2