Reputation: 771
I have two columns that are datetime64[ns] objects. I am trying to determine the number of months between them.
The columns are:
city_clean['last_trip_date']
city_clean['signup_date']
Format is YYYY-MM-DD
I tried
from dateutil.relativedelta import relativedelta
city_clean['months_active'] = relativedelta(city_clean['signup_date'], city_clean['last_trip_date'])
And get the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Does anyone know what could cause this issue? I feel like this is the most accurate way to calculate the number of months.
Upvotes: 2
Views: 13375
Reputation: 743
The first thing that comes to my mind...
>>> from datetime import datetime, timedelta
>>> dt1 = datetime(year=2020, month=3, day=1)
>>> dt2 = datetime(year=2020, month=5, day=1)
>>> # delta = dt2-dt1
>>> delta = abs(dt2-dt1)
>>> delta
datetime.timedelta(61)
>>> delta.days
61
UPDATE: What I meant to represent is the idea of using the absolute value of the delta -> abs()
In Python 3.10 it works with the dateutil.realtivedelta()
too
from datetime import datetime
from dateutil.relativedelta import relativedelta
city_clean_dates = [
{'signup_date': '2019-12-01', 'last_trip_date': '2020-02-01'},
{'signup_date': '2021-01-01', 'last_trip_date': '2020-05-01'},
{'signup_date': '2020-03-01', 'last_trip_date': '2020-05-31'},
]
for city_clean in city_clean_dates:
city_clean['last_trip_date'] = datetime.strptime(city_clean['last_trip_date'], '%Y-%m-%d')
city_clean['signup_date'] = datetime.strptime(city_clean['signup_date'], '%Y-%m-%d')
rd1 = abs(relativedelta(city_clean['last_trip_date'], city_clean['signup_date']))
rd2 = abs(relativedelta(city_clean['signup_date'], city_clean['last_trip_date']))
assert rd1 == rd2
print(f"Recent - old date: {rd1}")
print(f"Old - recent date: {rd2}")
this would print
Recent - old date: relativedelta(months=+2)
Old - recent date: relativedelta(months=+2)
Recent - old date: relativedelta(months=+8)
Old - recent date: relativedelta(months=+8)
Recent - old date: relativedelta(months=+2, days=+30)
Old - recent date: relativedelta(months=+2, days=+30)
Note neither of my solutions returns months, while the first one returns days only, and the second returns whole months + the extra days of the partial month.
The ambiguity of this is very obvious in the case of
{'last_trip_date': '2020-03-01', 'signup_date': '2020-05-31'}
Where normally we could say that's 3 months but in reality, it's one day short. It's up to the developer to overcome the ambiguity of such values considering the use-case.
Upvotes: 2
Reputation: 327
You need to extract the property you want from the relativedelta
, in this case, .months
:
from dateutil.relativedelta import relativedelta
rel = relativedelta(city_clean['signup_date'], city_clean['last_trip_date'])
city_clean['months_active'] = rel.years * 12 + rel.months
Upvotes: 3
Reputation: 19000
This is Pandas, right? Try it like this:
# calculate the difference between two dates
df['diff_months'] = df['End_date'] - df['Start_date']
# converts the difference in terms of Months (timedelta64(1,’M’)- capital M indicates Months)
df['diff_months']=df['diff_months']/np.timedelta64(1,'M')
Or, if you have proper datetimes objects,
def diff_month(d1, d2):
return (d1.year - d2.year) * 12 + d1.month - d2.month
Upvotes: 5