Reputation: 87
I have a data frame where I'm trying to add a new column based on the condition that adds 3 months to a DateTime.
ID1 ID2 Date
1 20 5/15/2019 11:06:47 AM
1 21 5/15/2019 11:06:47 AM
1 22 6/15/2019 11:06:47 AM
2 30 7/15/2019 11:06:47 AM
2 31 7/15/2019 11:06:47 AM
2 32 7/15/2019 11:06:47 AM
Required Output,
ID1 ID2 Date NewDate
1 20 5/15/2019 11:06:47 AM 8/15/2019 11:06:47 AM
1 21 5/15/2019 11:06:47 AM 9/15/2019 11:06:47 AM
1 22 6/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
2 30 7/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
2 31 7/15/2019 11:06:47 AM 11/15/2019 11:06:47 AM
2 32 7/15/2019 11:06:47 AM 12/15/2019 11:06:47 AM
For each ID1, there can be only one unique NewDate. If there exists a date that may fall in the same month, then add another month.
For ID1, having a different Date, if the NewDate falls on a month similar to previous NewDate, then we add another additional DateOffset as seen in Row 3 of the required output
I have tried the following code,
def add_date(df):
for each_ID1 in df['ID1']:
for each_ID2 in df['ID2']:
return df['Date'] + DateOffset(months = 3)
df['New Date'] = df.apply(add_date, axis = 1)
My code gives me the 3-month DateOffset only as shown,
ID1 ID2 Date NewDate
1 20 5/15/2019 11:06:47 AM 8/15/2019 11:06:47 AM
1 21 5/15/2019 11:06:47 AM 8/15/2019 11:06:47 AM
1 22 6/15/2019 11:06:47 AM 9/15/2019 11:06:47 AM
2 30 7/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
2 31 7/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
2 32 7/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
Output Error
ID1 ID2 Date NewDate
1 20 5/15/2019 11:06:47 AM 8/15/2019 11:06:47 AM
1 21 5/15/2019 11:06:47 AM 9/15/2019 11:06:47 AM
1 22 5/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
1 23 5/15/2019 11:06:47 AM 11/15/2019 11:06:47 AM
1 24 5/15/2019 11:06:47 AM 12/15/2020 11:06:47 AM
1 25 5/15/2019 11:06:47 AM 01/15/2021 11:06:47 AM
1 26 6/15/2019 11:06:47 AM 10/15/2019 11:06:47 AM
1 27 6/15/2019 11:06:47 AM 12/15/2019 11:06:47 AM
1 28 6/15/2019 11:06:47 AM 02/15/2020 11:06:47 AM
1 29 6/15/2019 11:06:47 AM 04/15/2020 11:06:47 AM
1 30 6/15/2019 11:06:47 AM 06/15/2020 11:06:47 AM
1 31 6/15/2019 11:06:47 AM 07/15/2020 11:06:47 AM
Upvotes: 3
Views: 66
Reputation: 16683
m
with a .groupby
on the column ID1
, return the cumulative count per group, and add 3 (since that is how many months you want to offset initially). With .cumcount()
, the offset will grow by 1 with each additional row of your group.New Date
column but ONLY Add m
to the FIRST date of each group, so we use df.groupby('ID1')['Date'].transform('first')
before adding m.values.astype("timedelta64[M]")
:Input (Latest Edit of your question):
ID1 ID2 Date
1 20 5/15/2019 11:06:47 AM
1 21 5/15/2019 11:06:47 AM
1 22 5/15/2019 11:06:47 AM
1 23 5/15/2019 11:06:47 AM
1 24 5/15/2019 11:06:47 AM
1 25 5/15/2019 11:06:47 AM
1 26 6/15/2019 11:06:47 AM
1 27 6/15/2019 11:06:47 AM
1 28 6/15/2019 11:06:47 AM
1 29 6/15/2019 11:06:47 AM
1 30 6/15/2019 11:06:47 AM
1 31 6/15/2019 11:06:47 AM
# df['Date'] = pd.to_datetime(df['Date'])
m = df.groupby('ID1').cumcount() + 3
df['New Date'] = df.groupby('ID1')['Date'].transform('first') + m.values.astype("timedelta64[M]")
df
Out[1]:
ID1 ID2 Date New Date
0 1 20 2019-05-15 11:06:47 2019-08-14 18:34:05
1 1 21 2019-05-15 11:06:47 2019-09-14 05:03:11
2 1 22 2019-05-15 11:06:47 2019-10-14 15:32:17
3 1 23 2019-05-15 11:06:47 2019-11-14 02:01:23
4 1 24 2019-05-15 11:06:47 2019-12-14 12:30:29
5 1 25 2019-05-15 11:06:47 2020-01-13 22:59:35
6 1 26 2019-06-15 11:06:47 2020-02-13 09:28:41
7 1 27 2019-06-15 11:06:47 2020-03-14 19:57:47
8 1 28 2019-06-15 11:06:47 2020-04-14 06:26:53
9 1 29 2019-06-15 11:06:47 2020-05-14 16:55:59
10 1 30 2019-06-15 11:06:47 2020-06-14 03:25:05
11 1 31 2019-06-15 11:06:47 2020-07-14 13:54:11
Upvotes: 1