PolarVertex
PolarVertex

Reputation: 57

Pandas create new date rows and forward fill column values for maximum next 3 consecutive months

I want to forward fill rows for the next 3 consecutive months but stops if a new data row is available for the same ID within that 3 months window.

Here is a sample data

id     date       value1  Value2
 1  2016-09-01      5      2
 1  2016-11-01      7      15
 2  2015-09-01      11      6
 2  2015-12-01      13      4
 2  2016-05-01       3      5
 

I would like to get

id    date          value1 value2
 1  2016-09-01      5      2
 1  2016-10-01      5      2
 1  2016-11-01      7      15
 1  2016-12-01      7      15
 1  2017-01-01      7      15
 1  2017-02-01      7      15
 2  2015-09-01      11      6
 2  2015-10-01      11      6
 2  2015-11-01      11      6
 2  2015-12-01      13      4
 2  2016-01-01      13      4
 2  2016-02-01      13      4
 2  2016-03-01      13      4
 2  2016-05-01       3      5
...

I tried a bunch of forward-fill methods and crossed join with the calendar but couldn't figure it out. Any help will be appreciated!

Upvotes: 1

Views: 206

Answers (1)

Michal Racko
Michal Racko

Reputation: 480

I think it might be done like this:

import pandas as pd
import datetime as dt

df = pd.DataFrame({
    'id': [1, 1, 2, 2, 2],
    'date': [
        dt.datetime.fromisoformat(s) for s in [
            '2016-09-01',
            '2016-11-01',
            '2015-09-01',
            '2015-12-01',
            '2016-05-01'
        ]
    ],
    'value1': [5, 7, 11, 13, 2],
    'value2': [2, 15, 6, 4, 5]
}).set_index('id')

result = []
for _id, data in df.groupby('id'):
    tmp_df = pd.DataFrame({
        'date': pd.period_range(
            start=min(data.date),
            end=max(data.date + dt.timedelta(days=31 * 3)),
            freq='M'
        ).to_timestamp()
    })
    tmp_df = tmp_df.join(data.set_index('date'), on='date')
    tmp_df['id'] = _id
    result.append(tmp_df.set_index('id'))
result = pd.concat(result).fillna(method='ffill', limit=3).dropna()
print(result)

Result:

         date  value1  value2
id                           
1  2016-09-01     5.0     2.0
1  2016-10-01     5.0     2.0
1  2016-11-01     7.0    15.0
1  2016-12-01     7.0    15.0
1  2017-01-01     7.0    15.0
1  2017-02-01     7.0    15.0
2  2015-09-01    11.0     6.0
2  2015-10-01    11.0     6.0
2  2015-11-01    11.0     6.0
2  2015-12-01    13.0     4.0
2  2016-01-01    13.0     4.0
2  2016-02-01    13.0     4.0
2  2016-03-01    13.0     4.0
2  2016-05-01     2.0     5.0
2  2016-06-01     2.0     5.0
2  2016-07-01     2.0     5.0
2  2016-08-01     2.0     5.0

Upvotes: 2

Related Questions