Cumulatively add month to a Date column based off the first date of a group

Question

I have a data frame where I'm trying to add a new column based on the condition that adds 3 months to a DateTime.

ID1    ID2          Date
 1     20    5/15/2019  11:06:47 AM
 1     21    5/15/2019  11:06:47 AM
 1     22    6/15/2019  11:06:47 AM
 2     30    7/15/2019  11:06:47 AM
 2     31    7/15/2019  11:06:47 AM
 2     32    7/15/2019  11:06:47 AM

Required Output,

ID1    ID2          Date                   NewDate
 1     20    5/15/2019 11:06:47 AM    8/15/2019 11:06:47 AM
 1     21    5/15/2019 11:06:47 AM    9/15/2019 11:06:47 AM
 1     22    6/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM
 2     30    7/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM
 2     31    7/15/2019 11:06:47 AM    11/15/2019 11:06:47 AM
 2     32    7/15/2019 11:06:47 AM    12/15/2019 11:06:47 AM

For each ID1, there can be only one unique NewDate. If there exists a date that may fall in the same month, then add another month.

For ID1, having a different Date, if the NewDate falls on a month similar to previous NewDate, then we add another additional DateOffset as seen in Row 3 of the required output

I have tried the following code,

def add_date(df):
    for each_ID1 in df['ID1']:
        for each_ID2 in df['ID2']:
            return df['Date'] + DateOffset(months = 3)
    
df['New Date'] = df.apply(add_date, axis = 1)

My code gives me the 3-month DateOffset only as shown,

ID1    ID2          Date                   NewDate
 1     20    5/15/2019 11:06:47 AM    8/15/2019 11:06:47 AM
 1     21    5/15/2019 11:06:47 AM    8/15/2019 11:06:47 AM
 1     22    6/15/2019 11:06:47 AM    9/15/2019 11:06:47 AM
 2     30    7/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM
 2     31    7/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM
 2     32    7/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM

Output Error

ID1    ID2          Date                   NewDate
 1     20    5/15/2019 11:06:47 AM    8/15/2019 11:06:47 AM
 1     21    5/15/2019 11:06:47 AM    9/15/2019 11:06:47 AM
 1     22    5/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM
 1     23    5/15/2019 11:06:47 AM    11/15/2019 11:06:47 AM
 1     24    5/15/2019 11:06:47 AM    12/15/2020 11:06:47 AM
 1     25    5/15/2019 11:06:47 AM    01/15/2021 11:06:47 AM
 1     26    6/15/2019 11:06:47 AM    10/15/2019 11:06:47 AM
 1     27    6/15/2019 11:06:47 AM    12/15/2019 11:06:47 AM
 1     28    6/15/2019 11:06:47 AM    02/15/2020 11:06:47 AM
 1     29    6/15/2019 11:06:47 AM    04/15/2020 11:06:47 AM
 1     30    6/15/2019 11:06:47 AM    06/15/2020 11:06:47 AM
 1     31    6/15/2019 11:06:47 AM    07/15/2020 11:06:47 AM

David Erickson · Accepted Answer

You can create a series m with a .groupby on the column ID1, return the cumulative count per group, and add 3 (since that is how many months you want to offset initially). With .cumcount(), the offset will grow by 1 with each additional row of your group.
Then, we want to create the New Date column but ONLY Add m to the FIRST date of each group, so we use df.groupby('ID1')['Date'].transform('first') before adding m.values.astype("timedelta64[M]"):

Input (Latest Edit of your question):

ID1    ID2          Date        
 1     20    5/15/2019 11:06:47 AM 
 1     21    5/15/2019 11:06:47 AM 
 1     22    5/15/2019 11:06:47 AM 
 1     23    5/15/2019 11:06:47 AM
 1     24    5/15/2019 11:06:47 AM
 1     25    5/15/2019 11:06:47 AM
 1     26    6/15/2019 11:06:47 AM 
 1     27    6/15/2019 11:06:47 AM 
 1     28    6/15/2019 11:06:47 AM 
 1     29    6/15/2019 11:06:47 AM  
 1     30    6/15/2019 11:06:47 AM 
 1     31    6/15/2019 11:06:47 AM

# df['Date'] = pd.to_datetime(df['Date'])
m = df.groupby('ID1').cumcount() + 3
df['New Date'] = df.groupby('ID1')['Date'].transform('first') + m.values.astype("timedelta64[M]")
df
Out[1]: 
    ID1  ID2                Date            New Date
0     1   20 2019-05-15 11:06:47 2019-08-14 18:34:05
1     1   21 2019-05-15 11:06:47 2019-09-14 05:03:11
2     1   22 2019-05-15 11:06:47 2019-10-14 15:32:17
3     1   23 2019-05-15 11:06:47 2019-11-14 02:01:23
4     1   24 2019-05-15 11:06:47 2019-12-14 12:30:29
5     1   25 2019-05-15 11:06:47 2020-01-13 22:59:35
6     1   26 2019-06-15 11:06:47 2020-02-13 09:28:41
7     1   27 2019-06-15 11:06:47 2020-03-14 19:57:47
8     1   28 2019-06-15 11:06:47 2020-04-14 06:26:53
9     1   29 2019-06-15 11:06:47 2020-05-14 16:55:59
10    1   30 2019-06-15 11:06:47 2020-06-14 03:25:05
11    1   31 2019-06-15 11:06:47 2020-07-14 13:54:11

Cumulatively add month to a Date column based off the first date of a group

Answers (1)

Related Questions