Reputation: 57
I want to forward fill rows for the next 3 consecutive months but stops if a new data row is available for the same ID within that 3 months window.
Here is a sample data
id date value1 Value2
1 2016-09-01 5 2
1 2016-11-01 7 15
2 2015-09-01 11 6
2 2015-12-01 13 4
2 2016-05-01 3 5
I would like to get
id date value1 value2
1 2016-09-01 5 2
1 2016-10-01 5 2
1 2016-11-01 7 15
1 2016-12-01 7 15
1 2017-01-01 7 15
1 2017-02-01 7 15
2 2015-09-01 11 6
2 2015-10-01 11 6
2 2015-11-01 11 6
2 2015-12-01 13 4
2 2016-01-01 13 4
2 2016-02-01 13 4
2 2016-03-01 13 4
2 2016-05-01 3 5
...
I tried a bunch of forward-fill methods and crossed join with the calendar but couldn't figure it out. Any help will be appreciated!
Upvotes: 1
Views: 206
Reputation: 480
I think it might be done like this:
import pandas as pd
import datetime as dt
df = pd.DataFrame({
'id': [1, 1, 2, 2, 2],
'date': [
dt.datetime.fromisoformat(s) for s in [
'2016-09-01',
'2016-11-01',
'2015-09-01',
'2015-12-01',
'2016-05-01'
]
],
'value1': [5, 7, 11, 13, 2],
'value2': [2, 15, 6, 4, 5]
}).set_index('id')
result = []
for _id, data in df.groupby('id'):
tmp_df = pd.DataFrame({
'date': pd.period_range(
start=min(data.date),
end=max(data.date + dt.timedelta(days=31 * 3)),
freq='M'
).to_timestamp()
})
tmp_df = tmp_df.join(data.set_index('date'), on='date')
tmp_df['id'] = _id
result.append(tmp_df.set_index('id'))
result = pd.concat(result).fillna(method='ffill', limit=3).dropna()
print(result)
Result:
date value1 value2
id
1 2016-09-01 5.0 2.0
1 2016-10-01 5.0 2.0
1 2016-11-01 7.0 15.0
1 2016-12-01 7.0 15.0
1 2017-01-01 7.0 15.0
1 2017-02-01 7.0 15.0
2 2015-09-01 11.0 6.0
2 2015-10-01 11.0 6.0
2 2015-11-01 11.0 6.0
2 2015-12-01 13.0 4.0
2 2016-01-01 13.0 4.0
2 2016-02-01 13.0 4.0
2 2016-03-01 13.0 4.0
2 2016-05-01 2.0 5.0
2 2016-06-01 2.0 5.0
2 2016-07-01 2.0 5.0
2 2016-08-01 2.0 5.0
Upvotes: 2