Reputation: 344
Given the following table:
df = pd.DataFrame({'pers_no': [1,1,2], 'start_date': ['2000-03-01','2000-06-01', '2001-04-01'], 'end_date': ['2000-05-01','2000-07-01', '2001-06-01'], 'value': [199,219,249]})
pers_no start_date end_date value
0 1 2000-03-01 2000-05-01 199
1 1 2000-06-01 2000-07-01 219
2 2 2001-04-01 2001-06-01 249
How to expand the DataFrame to get extra rows for e.g. each month between start date and end date? The result should look like this:
pers_no date value
0 1 2000-03-01 199
1 1 2000-04-01 199
2 1 2000-05-01 199
3 1 2000-06-01 219
4 1 2000-07-01 219
5 2 2001-04-01 249
6 2 2001-05-01 249
7 2 2001-06-01 249
Upvotes: 0
Views: 165
Reputation: 4929
You can make new column with date_range and explode the data like this:
def get_dt_range(dt):
return pd.date_range(dt['start_date'], dt['end_date']+pd.offsets.MonthEnd(), freq='MS')
df['date'] = df[['start_date','end_date']].apply(get_dt_range, axis=1)
df.explode('date') [['pers_no', 'date', 'value']]
Output:
pers_no date value
0 1 2000-03-01 199
0 1 2000-04-01 199
0 1 2000-05-01 199
1 1 2000-06-01 219
1 1 2000-07-01 219
2 2 2001-04-01 249
2 2 2001-05-01 249
2 2 2001-06-01 249
Upvotes: 2
Reputation: 11504
You can do this:
pd.concat([pd.DataFrame({'Date': pd.date_range(row.start_date, row.end_date, freq='d'),
'value': row.value,
'pers_no': row.pers_no}, columns=['Date', 'value','pers_no'])
for i, row in df.iterrows()], ignore_index=True)
which gives:
Date value pers_no
0 2000-03-01 199 1
1 2000-03-02 199 1
2 2000-03-03 199 1
3 2000-03-04 199 1
4 2000-03-05 199 1
.. ... ... ...
150 2001-05-28 249 2
151 2001-05-29 249 2
152 2001-05-30 249 2
153 2001-05-31 249 2
154 2001-06-01 249 2
Upvotes: 1