Reputation: 52
I have a Pandas dataset with a monthly Date-time index and a column of outstanding orders (like below):
Date | orders |
---|---|
1991-01-01 | nan |
1991-02-01 | nan |
1991-03-01 | 24 |
1991-04-01 | nan |
1991-05-01 | nan |
1991-06-01 | nan |
1991-07-01 | nan |
1991-08-01 | 34 |
1991-09-01 | nan |
1991-10-01 | nan |
1991-11-01 | 22 |
1991-12-01 | nan |
I want to linearly interpolate the values to fill the nans. However it has to be applied within 6-month blocks (non-rolling). So for example, one 6-month block would be all the rows between 1991-01-01
and 1991-06-01
, where we would do forward and backward linear imputation such that if there is a nan
the interpolation would be descending to a final value of 0. So for the same dataset above here is how I would like the end result to look:
Date | orders |
---|---|
1991-01-01 | 8 |
1991-02-01 | 16 |
1991-03-01 | 24 |
1991-04-01 | 18 |
1991-05-01 | 12 |
1991-06-01 | 6 |
1991-07-01 | 17 |
1991-08-01 | 34 |
1991-09-01 | 30 |
1991-10-01 | 26 |
1991-11-01 | 22 |
1991-12-01 | 11 |
I am lost on how to do this in Pandas however. Any ideas?
Upvotes: 1
Views: 132
Reputation: 862511
Idea is grouping per 6 months with prepend and append 0 values, interpolate and then remove first and last 0 values per groups:
df['Date'] = pd.to_datetime(df['Date'])
f = lambda x: pd.Series([0] + x.tolist() + [0]).interpolate().iloc[1:-1]
df['orders'] = (df.groupby(pd.Grouper(freq='6MS', key='Date'))['orders']
.transform(f))
print (df)
Date orders
0 1991-01-01 8.0
1 1991-02-01 16.0
2 1991-03-01 24.0
3 1991-04-01 18.0
4 1991-05-01 12.0
5 1991-06-01 6.0
6 1991-07-01 17.0
7 1991-08-01 34.0
8 1991-09-01 30.0
9 1991-10-01 26.0
10 1991-11-01 22.0
11 1991-12-01 11.0
Upvotes: 1