Reputation: 9
For example (input pandas dataframe):
start_date end_date value
0 2018-05-17 2018-05-20 4
1 2018-05-22 2018-05-27 12
2 2018-05-14 2018-05-21 8
I want it to divide the value by the # of intervals present in the data (e.g. 2018-05-12 to 2018-05-27 has 6 days, 12 / 6 = 2) and then create a time series data like the following:
date value
0 2018-05-14 1
1 2018-05-15 1
2 2018-05-16 1
3 2018-05-17 2
4 2018-05-18 2
5 2018-05-19 2
6 2018-05-20 2
7 2018-05-21 1
8 2018-05-22 2
9 2018-05-23 2
10 2018-05-24 2
11 2018-05-25 2
12 2018-05-26 2
13 2018-05-27 2
is this possible to do without an inefficient loop through every row using pandas? Is there also a name for this method?
Upvotes: 1
Views: 72
Reputation: 862521
You can use:
#convert to datetimes if necessary
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
For each row generate list of Series
by date_range
, then divide their length and aggregate by groupby
with sum
:
dfs = [pd.Series(r.value, pd.date_range(r.start_date, r.end_date)) for r in df.itertuples()]
df = (pd.concat([x / len(x) for x in dfs])
.groupby(level=0)
.sum()
.rename_axis('date')
.reset_index(name='val'))
print (df)
date val
0 2018-05-14 1.0
1 2018-05-15 1.0
2 2018-05-16 1.0
3 2018-05-17 2.0
4 2018-05-18 2.0
5 2018-05-19 2.0
6 2018-05-20 2.0
7 2018-05-21 1.0
8 2018-05-22 2.0
9 2018-05-23 2.0
10 2018-05-24 2.0
11 2018-05-25 2.0
12 2018-05-26 2.0
13 2018-05-27 2.0
Upvotes: 1