How to expand a DataFrame with extra rows based on a set time interval between a start date and an end date?

Question

Given the following table:

df = pd.DataFrame({'pers_no': [1,1,2], 'start_date': ['2000-03-01','2000-06-01', '2001-04-01'], 'end_date': ['2000-05-01','2000-07-01', '2001-06-01'], 'value': [199,219,249]})

pers_no start_date  end_date    value
0   1   2000-03-01  2000-05-01  199
1   1   2000-06-01  2000-07-01  219
2   2   2001-04-01  2001-06-01  249

How to expand the DataFrame to get extra rows for e.g. each month between start date and end date? The result should look like this:


pers_no date        value
0   1   2000-03-01  199
1   1   2000-04-01  199
2   1   2000-05-01  199
3   1   2000-06-01  219
4   1   2000-07-01  219
5   2   2001-04-01  249
6   2   2001-05-01  249
7   2   2001-06-01  249

Cain&#227; Max Couto da Silva · Accepted Answer

You can make new column with date_range and explode the data like this:

def get_dt_range(dt):
    return pd.date_range(dt['start_date'], dt['end_date']+pd.offsets.MonthEnd(), freq='MS')

df['date'] = df[['start_date','end_date']].apply(get_dt_range, axis=1)
df.explode('date') [['pers_no', 'date', 'value']]

Output:

   pers_no       date  value
0        1 2000-03-01    199
0        1 2000-04-01    199
0        1 2000-05-01    199
1        1 2000-06-01    219
1        1 2000-07-01    219
2        2 2001-04-01    249
2        2 2001-05-01    249
2        2 2001-06-01    249

How to expand a DataFrame with extra rows based on a set time interval between a start date and an end date?

Answers (2)

Related Questions