Reputation: 349
I have a pandas dataframe shown below (sample) and I want to create a new table with an extra column 'NewDate' which will look at StartDate and show the last date of the month for start date and subsequently last date of every month till the end date for each ID and if my ID has End Date as Null the series will stop at the last date of current month which is May 2022.
ID StartDate EndDate
100 1/01/2022 26/04/2022
101 20/04/2022 Null
102 1/01/2022 27/02/2022
....
My Expected Output:
ID StartDate EndDate NewDate
100 1/01/2022 26/04/2022 31/01/2022
100 1/01/2022 26/04/2022 28/02/2022
100 1/01/2022 26/04/2022 31/03/2022
100 1/01/2022 26/04/2022 30/04/2022
101 20/04/2022 Null 30/04/2022
101 20/04/2022 Null 31/05/2022
102 1/01/2022 27/02/2022 31/01/2022
102 1/01/2022 27/02/2022 28/02/2022
...
Upvotes: 1
Views: 2149
Reputation: 23071
Try this
# convert each date column to datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# create date ranges for each row
f = lambda s,e: pd.date_range(s, e+pd.DateOffset(months=1), freq='M')
df['NewDate'] = [f(s,e) if e==e else f(s,pd.datetime.now()) for s, e in zip(df['StartDate'], df['EndDate'])]
# explode the new column
df = df.explode('NewDate')
print(df)
ID StartDate EndDate NewDate
0 100 2022-01-01 2022-04-26 2022-01-31
0 100 2022-01-01 2022-04-26 2022-02-28
0 100 2022-01-01 2022-04-26 2022-03-31
0 100 2022-01-01 2022-04-26 2022-04-30
1 101 2022-04-20 NaT 2022-04-30
1 101 2022-04-20 NaT 2022-05-31
2 102 2022-01-01 2022-02-27 2022-01-31
2 102 2022-01-01 2022-02-27 2022-02-28
Upvotes: 1
Reputation: 323226
You need trim your date first , then we create the date range with pd.date_range
and explode
the column
s1 = pd.to_datetime(df.StartDate, format = '%d/%m/%Y')
s2 = pd.to_datetime(df.EndDate, format = '%d/%m/%Y', errors = 'coerce') + pd.offsets.MonthEnd(0)
s2 = s2.fillna(s1 + pd.offsets.MonthEnd(2))
df['new'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.StartDate, s+pd.offsets.MonthEnd(1))]
out = df.explode('new')
out
Out[206]:
ID StartDate EndDate new
0 100 1/01/2022 26/04/2022 2022-01-31
0 100 1/01/2022 26/04/2022 2022-02-28
0 100 1/01/2022 26/04/2022 2022-03-31
0 100 1/01/2022 26/04/2022 2022-04-30
1 101 20/04/2022 Null 2022-04-30
1 101 20/04/2022 Null 2022-05-31
2 102 1/01/2022 27/02/2022 2022-01-31
2 102 1/01/2022 27/02/2022 2022-02-28
Updated
s2 = s2.fillna(s1 + pd.offsets.MonthEnd(1))
df['new_date'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.start_date, s2+pd.offsets.MonthEnd(1))]
output = df.explode('new_date')
Upvotes: 1