Reputation: 216
Name Start End Units Place
Sam 04-03-2022 06-03-2022 2 CA
Uber 24-04-2022 27-05-2022 1 SVL
Twitter 26-04-2022 28-04-2022 2 FR
My dataframe is like above. I wish to duplicate each row by n times where n equal to the difference between Start and End entry. But while duplicating the Start has to increment by one each time. So, My output need to be something like below:
Name Start Units Place
Sam 04-03-2022 2 CA
Sam 05-03-2022 2 CA
Uber 24-04-2022 1 SVL
Uber 25-04-2022 1 SVL
Uber 26-04-2022 1 SVL
Twitter 26-04-2022 2 FR
Twitter 27-04-2022 2 FR
I am starring at this for quite sometime. But, clueless.
Upvotes: 0
Views: 19
Reputation: 120479
You can apply pd.date_range
to each row then explode your dataframe:
# Not mandatory if it's already the case
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
date_range = lambda x: pd.date_range(x['Start'], x['End']-pd.DateOffset(days=1))
out = (df.assign(Start=df.apply(date_range, axis=1))
.explode('Start', ignore_index=True).drop(columns='End'))
Output:
>>> df
Name Start Units Place
0 Sam 2022-03-04 2 CA
1 Sam 2022-03-05 2 CA
2 Uber 2022-04-24 1 SVL
3 Uber 2022-04-25 1 SVL
4 Uber 2022-04-26 1 SVL
5 Uber 2022-04-27 1 SVL
6 Uber 2022-04-28 1 SVL
7 Uber 2022-04-29 1 SVL
8 Uber 2022-04-30 1 SVL
9 Uber 2022-05-01 1 SVL
10 Uber 2022-05-02 1 SVL
11 Uber 2022-05-03 1 SVL
12 Uber 2022-05-04 1 SVL
13 Uber 2022-05-05 1 SVL
14 Uber 2022-05-06 1 SVL
15 Uber 2022-05-07 1 SVL
16 Uber 2022-05-08 1 SVL
17 Uber 2022-05-09 1 SVL
18 Uber 2022-05-10 1 SVL
19 Uber 2022-05-11 1 SVL
20 Uber 2022-05-12 1 SVL
21 Uber 2022-05-13 1 SVL
22 Uber 2022-05-14 1 SVL
23 Uber 2022-05-15 1 SVL
24 Uber 2022-05-16 1 SVL
25 Uber 2022-05-17 1 SVL
26 Uber 2022-05-18 1 SVL
27 Uber 2022-05-19 1 SVL
28 Uber 2022-05-20 1 SVL
29 Uber 2022-05-21 1 SVL
30 Uber 2022-05-22 1 SVL
31 Uber 2022-05-23 1 SVL
32 Uber 2022-05-24 1 SVL
33 Uber 2022-05-25 1 SVL
34 Uber 2022-05-26 1 SVL
35 Twitter 2022-04-26 2 FR
36 Twitter 2022-04-27 2 FR
Upvotes: 1