qqqwww
qqqwww

Reputation: 531

Python Dataframe Date plus months variable which comes from the other column

 enter image description here

I have a dataframe with the date and month_diff variable. I would like to get a new date (name it as Target_Date) based on the following logic: For example, the date is 2/13/2019, month_diff is 3, then the target date should be the month-end of the original date plus 3 months, which is 5/31/2019

I tried the following method to get the traget date first:

df["Target_Date"] = df["Date"] + pd.DateOffset(months = df["month_diff"])

But it failed, as I know, the parameter in the dateoffset should be a varaible or a fixed number.

I also tried:

df["Target_Date"] = df["Date"] + relativedelta(months = df["month_diff"])

It failes too.

Anyone can help? thank you.

edit: this is a large dataset with millions rows.

Upvotes: 0

Views: 3703

Answers (4)

Kamil Kacperski
Kamil Kacperski

Reputation: 93

I was looking for a solution I can write in one line only and apply does the job. However, by default apply function performs action on each column, so you have to remember to specify correct axis: axis=1.

from datetime import datetime
from dateutil.relativedelta import relativedelta

# Create a new column with date adjusted by number of months from 'month_diff' column and later adjust to the last day of month
df['Target_Date'] = df.apply(lambda row: row.Date # to current date
        + relativedelta(months=row.month_diff)    # add month_diff
        + relativedelta(day=+31)                  # and adjust to the last day of month
        , axis=1)                                 # 1 or ‘columns’: apply function to each row.

Upvotes: 0

Unix
Unix

Reputation: 89

import pandas as pd
from datetime import datetime
from datetime import timedelta 

This is my approach in solving your issue. However for some reason I am getting a semantic error in my output even though I am sure it is the correct way. Please everyone correct me if you notice something wrong.

today = datetime.now()
today = today.strftime("%d/%m/%Y")
month_diff =[30,5,7]
n = 30
for i in month_diff:
    b = {'Date': today, 'month_diff':month_diff,"Target_Date": datetime.now()+timedelta(days=i*n)}
    df = pd.DataFrame(data=b)

Output:

Output

For some reason the i is not getting updated.

Upvotes: 0

bramb
bramb

Reputation: 253

I would approach in the following method to compute your "target_date".

  1. Apply the target month offset (in your case +3months), using your pd.DateOffset.
  2. Get the last day of that target month (using for example calendar.monthrange, see also "Get last day of the month"). This will provide you with the "flexible" part of that date" offset.
  3. Apply the flexible day offset, when comparing the result of step 1. and step 2. This could be a new pd.DateOffset.

A solution could look something like this:

import calendar
from dateutil.relativedelta import relativedelta

for ii in df.index:
    new_ = df.at[ii, 'start_date'] + relativedelta(months=df.at[ii, 'month_diff'])
    max_date = calendar.monthrange(new_.year, new_.month)[1]
    end_ = new_ + relativedelta(days=max_date - new_.day)
    print(end_)

Further "cleaning" into a function and / or list comprehension will probably make it much faster

Upvotes: 0

Buckeye14Guy
Buckeye14Guy

Reputation: 851

You could try this

import pandas as pd
from dateutil.relativedelta import relativedelta
df = pd.DataFrame({'Date': [pd.datetime(2019,1,1), pd.datetime(2019,2,1)], 'month_diff': [1,2]})
df.apply(lambda row: row.Date + relativedelta(months=row.month_diff), axis=1)

Or list comprehension

[date + relativedelta(months=month_diff) for date, month_diff in df[['Date', 'month_diff']].values]

Upvotes: 1

Related Questions