Reputation: 83
I have the following Pandas DataFrame:
ID start_date end_date codes type
1 2019-01-01 2019-01-05 [x, y] A
2 2019-01-01 2019-01-05 [x, y, z] B
What I want to do is to generate the same number of rows as the range between two dates, for each code. The output will be this:
ID date codes type
1 2019-01-01 x A
1 2019-01-02 x A
1 2019-01-03 x A
1 2019-01-04 x A
1 2019-01-05 x A
1 2019-01-01 y A
1 2019-01-02 y A
1 2019-01-03 y A
1 2019-01-04 y A
1 2019-01-05 y A
2 2019-01-01 x B
2 2019-01-02 x B
.....
Thank you very much!
Upvotes: 1
Views: 33
Reputation: 30920
#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
.set_index('date')
.groupby(['ID','type'])
.resample('D').ffill()
.drop(columns = 'variable')
.explode('codes')
.reset_index(level=[0,1],drop=True)
.sort_values(['ID','type','codes'])
.reset_index()
.reindex(columns = ['ID','date','codes','type'])
)
print(new_df)
#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
.set_index('date')
.groupby(['ID','type'])
.resample('D').ffill()
.drop(columns = 'variable'))
new_df = (new_df.reindex(new_df.index.repeat(new_df.codes.str.len()))
.assign(codes=np.concatenate(new_df.codes.values))
.reset_index(level=[0,1],drop=True)
.sort_values(['ID','type','codes'])
.reset_index()
.reindex(columns = ['ID','date','codes','type']))
print(new_df)
Output
ID date codes type
0 1 2019-01-01 x A
1 1 2019-01-02 x A
2 1 2019-01-03 x A
3 1 2019-01-04 x A
4 1 2019-01-05 x A
5 1 2019-01-01 y A
6 1 2019-01-02 y A
7 1 2019-01-03 y A
8 1 2019-01-04 y A
9 1 2019-01-05 y A
10 2 2019-01-01 x B
11 2 2019-01-02 x B
12 2 2019-01-03 x B
13 2 2019-01-04 x B
14 2 2019-01-05 x B
15 2 2019-01-01 y B
16 2 2019-01-02 y B
17 2 2019-01-03 y B
18 2 2019-01-04 y B
19 2 2019-01-05 y B
20 2 2019-01-01 z B
21 2 2019-01-02 z B
22 2 2019-01-03 z B
23 2 2019-01-04 z B
24 2 2019-01-05 z B
Upvotes: 2