Reputation: 4605
I experience this a lot in modeling time series. Sometimes you may have data reported at different frequencies, say one daily and one weekly. What I'd like is not to forward fill the weekly data point for every day of the week (since it is usually a sum of all the values of during the week already), but forward fill or replace the data with it's mean. In essence, I'd like to spread out the data.
So if I have
s = pd.Series(index=pd.date_range('2015/1/1', '2015/1/9'),
data=[2, np.nan, 6, np.nan, np.nan, 2, np.nan, np.nan, np.nan])
then I'd like to return
2015-01-01 1
2015-01-02 1
2015-01-03 2
2015-01-04 2
2015-01-05 2
2015-01-06 0.5
2015-01-07 0.5
2015-01-08 0.5
2015-01-09 0.5
Freq: D, dtype: float64
Any thoughts on an easy way to do this? Is a for-loop inescapable?
Upvotes: 1
Views: 309
Reputation: 24742
Here is one way using .cumcount
to separate series into different groups and then transform
.
s.fillna(method='ffill').groupby(s.notnull().cumsum()).transform(lambda g: g/len(g))
2015-01-01 1.0
2015-01-02 1.0
2015-01-03 2.0
2015-01-04 2.0
2015-01-05 2.0
2015-01-06 0.5
2015-01-07 0.5
2015-01-08 0.5
2015-01-09 0.5
Freq: D, dtype: float64
Upvotes: 4