Reputation: 357
Below is script for a simplified version of the df in question:
import pandas as pd
import numpy as np
from datetime import date
from datetime import datetime
df = pd.DataFrame({'date':pd.date_range(start='2020-01-01', end='2020/01/07'),
'id' : range(1,8),
'product': ['list_3','list_1','list_2', 'list_3','list_2','list_1','list_1'],
'duration' : [3,1,2,3,2,1,1],
'product_in_use': ('true'),
'end_date':['2020-01-03','2020-01-02','2020-01-04','2020-01-06','2020-01-06','2020-01-06',
'2020-01-07']})
df['duration']= pd.to_timedelta(df['duration'], unit='D')
df['date'] = pd.to_datetime(df['date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df
df:
date id product duration product_in_use end_date
0 2020-01-01 1 list_3 3 days true 2020-01-03
1 2020-01-02 2 list_1 1 days true 2020-01-02
2 2020-01-03 3 list_2 2 days true 2020-01-04
3 2020-01-04 4 list_3 3 days true 2020-01-06
4 2020-01-05 5 list_2 2 days true 2020-01-06
5 2020-01-06 6 list_1 1 days true 2020-01-06
6 2020-01-07 7 list_1 1 days true 2020-01-07
As you can see in the df above, each id is using a product and each product lasts for a particular duration. There are no rows for the days when the product is in action, only rows for when the product was purchased by the user.
Therefore, I would like to create new rows, for all dates where the product is in use for each id.
My intended df would therefore be this:
date id product duration product_in_use
0 2020-01-01 1 list_3 3 days true
1 2020-01-02 1 list_3 3 days true
2 2020-01-03 1 list_3 3 days true
3 2020-01-02 2 list_1 1 days true
4 2020-01-03 3 list_2 2 days true
5 2020-01-04 3 list_2 2 days true
6 2020-01-04 4 list_3 3 days true
7 2020-01-05 4 list_3 3 days true
8 2020-01-06 4 list_3 3 days true
9 2020-01-05 5 list_2 3 days true
10 2020-01-06 5 list_2 2 days true
11 2020-01-06 6 list_1 2 days true
12 2020-01-07 7 list_1 1 days true
Upvotes: 4
Views: 1080
Reputation: 151
Create a different DataFrame
, then execute an outer join to add the new rows.
Upvotes: 1
Reputation: 28644
create date ranges from the start date to the end date for each id, using starmap and chain, expand the main dataframe by the duration, and assign the new dates as index of the dataframe.
from itertools import starmap,chain
#create date ranges from date to end_date for each id
start_end = zip(df.date.array,df.end_date.array)
date_ranges = starmap(pd.date_range,start_end)
date_ranges = chain.from_iterable(date_ranges)
#get all columns except date and end_date
res = df.filter(['id','product','duration','product_in_use'])
#expand the dataframe by repeating the indexes based on the duration
#so index 0 will be repeated 3 times, 1 once, 2 twice, ...
res = res.reindex(res.index.repeat(res.duration.dt.days))
#assign the new date_ranges to the dataframe
res.index = date_ranges
res
id product duration product_in_use
2020-01-01 1 list_3 3 days true
2020-01-02 1 list_3 3 days true
2020-01-03 1 list_3 3 days true
2020-01-02 2 list_1 1 days true
2020-01-03 3 list_2 2 days true
2020-01-04 3 list_2 2 days true
2020-01-04 4 list_3 3 days true
2020-01-05 4 list_3 3 days true
2020-01-06 4 list_3 3 days true
2020-01-05 5 list_2 2 days true
2020-01-06 5 list_2 2 days true
2020-01-06 6 list_1 1 days true
2020-01-07 7 list_1 1 days true
Upvotes: 2
Reputation: 1413
If you do not convert 'duration' field to timedelta then this worked for me:
df1 = pd.DataFrame()
for idx in df.index:
print(idx, df['duration'][idx])
for i in range(df['duration'][idx]):
temp_df = df[idx:idx+1]
temp_df['date'] = pd.to_datetime(temp_df['date']) + timedelta(days=i)
df1 = df1.append(temp_df)
df1.reset_index(inplace=True)
df1.drop(['end_date', 'index'], axis=1, inplace=True)
print(df1)
Output:
date id product duration product_in_use
0 2020-01-01 1 list_3 3 true
1 2020-01-02 1 list_3 3 true
2 2020-01-03 1 list_3 3 true
3 2020-01-02 2 list_1 1 true
4 2020-01-03 3 list_2 2 true
5 2020-01-04 3 list_2 2 true
6 2020-01-04 4 list_3 3 true
7 2020-01-05 4 list_3 3 true
8 2020-01-06 4 list_3 3 true
9 2020-01-05 5 list_2 2 true
10 2020-01-06 5 list_2 2 true
11 2020-01-06 6 list_1 1 true
12 2020-01-07 7 list_1 1 true
Upvotes: 1
Reputation: 129
s = pd.concat(pd.Series(r.Index, pd.date_range(r.date,r.end_date))
for r in df.itertuples())
df1 = df.loc[s].assign(date=s.index).reset_index(drop=True)
print(df1)
Upvotes: 1