Reputation: 8950
I have a dataframe with columns of dates.
I know how to shift dates by a fixed number of months (eg add 3 months to all the dates in column x); however, I cannot figure out how to shift dates by a number of months which is not fixed, but is another column of the dataframe.
Any ideas?
I have copied a minimal example below. The error I get is:
The truth value of a Series is ambiguous
Thanks a lot!
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame()
df['year'] = np.arange(2000,2010)
df['month'] = 3
df['mydate'] = pd.to_datetime( (df.year * 10000 + df.month * 100 +1).apply(str), format='%Y%m%d')
df['month shift'] = np.arange(0,10)
# if I want to shift mydate by 3 months, I can convert it to DatetimeIndex and use dateOffset:
df['my date shifted by 3 months'] = pd.DatetimeIndex( df['mydate'] ) + pd.DateOffset(months = 3)
# however, how do I shift mydate by the number of months in the column 'month shift'?
#This does NOT work:
df['my date shifted'] = pd.DatetimeIndex( df['mydate'] ) + pd.DateOffset(months = df['month shift'])
print df
Upvotes: 7
Views: 27093
Reputation: 500
I used DateOffset
from pandas
to make this.
from pandas.tseries.offsets import DateOffset
def shift_months():
input_days = ['2024-09-26', '2024-10-26', '2024-11-26', '2024-12-26', '2025-01-26']
for input_day in input_days:
input_date_time = pd.to_datetime(input_day)
new_date_time = input_date_time + DateOffset(months=2)
print(f'Original - {input_day} >>> Shifted - {new_date_time}')
Here is the output -
Original - 2024-09-26 >>> Shifted - 2024-11-26 00:00:00
Original - 2024-10-26 >>> Shifted - 2024-12-26 00:00:00
Original - 2024-11-26 >>> Shifted - 2025-01-26 00:00:00
Original - 2024-12-26 >>> Shifted - 2025-02-26 00:00:00
Original - 2025-01-26 >>> Shifted - 2025-03-26 00:00:00
Upvotes: 0
Reputation: 1100
"one"-liner using the underlying numpy functionality:
df['my date shifted'] = (
df["mydate"].values.astype("datetime64[M]")
+ df["month shift"].values.astype("timedelta64[M]")
)
Upvotes: 4
Reputation: 582
EdChurn's solution is indeed much faster than the answer of Anton Protopopov and in fact in my use case it executes in milliseconds versus the one with apply taking minutes. The problem is that the solution EdChurn posted in his comment gives slightly incorrect results. In the example:
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame()
df['year'] = np.arange(2000,2010)
df['month'] = 3
df['mydate'] = pd.to_datetime((df.year * 10000 + df.month * 100 + 1).apply(str), format='%Y%m%d')
df['month shift'] = np.arange(0,10)
The answer of:
df['my date shifted'] = df['mydate'] + pd.TimedeltaIndex( df['month shift'], unit='M')
The correct solution can be obtained with:
def set_to_month_begin(series):
#Following doesn't work:
# res = series.dt.floor("MS")
#This also doesn't work (it fails in case the date is already the first day of the month):
# res = series - pd.offsets.MonthBegin(1)
res = pd.to_datetime(series).dt.normalize()
res = res - pd.to_timedelta(res.dt.day - 1, unit='d')
return res
def add_months(df, date_col, months_num_col):
"""This function adds the number of months specified per each row in `months_num_col` to date in `date_col`.
This method is *significantly* faster than:
df.apply(lambda x: x[date_col] + pd.DateOffset(months = x[months_num_col]), axis=1)
"""
number_of_days_in_avg_month = 365.24 / 12
time_delta = pd.TimedeltaIndex(df[months_num_col] * number_of_days_in_avg_month + 10, unit='D')
return set_to_month_begin(df[date_col] + time_delta)
df['my date shifted'] = add_months(df, 'mydate', 'month shift')
This gives the following result:
Upvotes: 1
Reputation: 31672
IIUC you could use apply
with axis=1
:
In [23]: df.apply(lambda x: x['mydate'] + pd.DateOffset(months = x['month shift']), axis=1)
Out[23]:
0 2000-03-01
1 2001-04-01
2 2002-05-01
3 2003-06-01
4 2004-07-01
5 2005-08-01
6 2006-09-01
7 2007-10-01
8 2008-11-01
9 2009-12-01
dtype: datetime64[ns]
Upvotes: 5