Reputation: 141
My data set is much larger so I have simplified it.
I want to convert the dataframe into a time-series.
The bit I am stuck on:
I have overlapping date ranges, where I have a smaller date range inside a larger one, as shown by row 0 and row 1, where row 1 and row 2 are inside the date range of row 0.
df:
date1 date2 reduction
0 2016-01-01 - 2016-01-05 7.0
1 2016-01-02 - 2016-01-03 5.0
2 2016-01-03 - 2016-01-04 6.0
3 2016-01-05 - 2016-01-12 10.0
How I want the output to look:
date1 date2 reduction
0 2016-01-01 2016-01-02 7.0
1 2016-01-02 2016-01-03 5.0
2 2016-01-03 2016-01-04 6.0
3 2016-01-04 2016-01-05 7.0
4 2016-01-05 2016-01-06 10.0
5 2016-01-06 2016-01-07 10.0
6 2016-01-07 2016-01-08 10.0
7 2016-01-08 2016-01-09 10.0
8 2016-01-09 2016-01-10 10.0
9 2016-01-10 2016-01-11 10.0
10 2016-01-11 2016-01-12 10.0
Upvotes: 0
Views: 90
Reputation: 35115
I prepared two consecutive columns of data with minimum and maximum dates and ran updates from the original DF.
import pandas as pd
import numpy as np
import io
data='''
date1 date2 reduction
0 2016-01-01 2016-01-05 7.0
1 2016-01-02 2016-01-03 5.0
2 2016-01-03 2016-01-04 6.0
3 2016-01-05 2016-01-12 10.0
'''
df = pd.read_csv(io.StringIO(data), sep=' ', index_col=0)
date_1 = pd.date_range(df.date1.min(), df.date2.max())
date_2 = pd.date_range(df.date1.min(), df.date2.max())
df2 = pd.DataFrame({'date1':date_1, 'date2':date_2, 'reduction':[0]*len(date_1)})
df2['date2'] = df2.date2.shift(-1)
df2.dropna(inplace=True)
for i in range(len(df)):
df2['reduction'][(df2.date1 >= df.date1.iloc[i]) & (df2.date2 <= df.date2.iloc[i])] = df.reduction.iloc[i]
df2
date1 date2 reduction
0 2016-01-01 2016-01-02 7
1 2016-01-02 2016-01-03 5
2 2016-01-03 2016-01-04 6
3 2016-01-04 2016-01-05 7
4 2016-01-05 2016-01-06 10
5 2016-01-06 2016-01-07 10
6 2016-01-07 2016-01-08 10
7 2016-01-08 2016-01-09 10
8 2016-01-09 2016-01-10 10
9 2016-01-10 2016-01-11 10
10 2016-01-11 2016-01-12 10
Upvotes: 1