Reputation: 43
I'm trying to create a new column in a dataframe, calculating a persons age using Dateutil's relativedelta function, using the following code;
df['Age'] = relativedelta(df['Today'], df['DOB']).years
However, I'm getting the following errors;
ValueError Traceback (most recent call last)
<ipython-input-99-f87ca88a2e3c> in <module>()
1
----> 2 df['Years of Age2'] = relativedelta(df['Today'], df['DOB']).years
C:\anaconda3\lib\site-packages\dateutil\relativedelta.py in __init__(self, dt1, dt2, years, months, days, leapdays, weeks, hours, minutes, seconds, microseconds, year, month, day, weekday, yearday, nlyearday, hour, minute, second, microsecond)
101 "ambiguous and not currently supported.")
102
--> 103 if dt1 and dt2:
104 # datetime is a subclass of date. So both must be date
105 if not (isinstance(dt1, datetime.date) and
C:\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It is successful outside a dataframe as below;
DOB = datetime.date(1990,8,25)
Today = datetime.date.today()
relativedelta(Today, DOB).years
Out[2]: 29
=====================================================================
So i assume I am doing something wrong with passing in the datatypes to the function from the Dataframe?
I can calculate age a different way with the below code, I just don't understand why the first method isn't working.
df['Years of Age'] = np.round((df['Today'] - df['DOB'])/np.timedelta64(1,'Y'),decimals = 0)
Here is the starter code;
import pandas as pd
import numpy as np
import datetime
from dateutil.relativedelta import relativedelta
ind = 'Andy Brandy Cindy'
MyDict = {"DOB" : [ (datetime.date(1954,7,5)),
(datetime.date(1998,1,27)),
(datetime.date(2001,3,15)) ]}
df = pd.DataFrame(data=MyDict,index=ind.split())
df['Today'] = datetime.date.today()
df
DOB Today
Andy 1954-07-05 2019-08-30
Brandy 1998-01-27 2019-08-30
Cindy 2001-03-15 2019-08-30
Here is the calculation;
df['Age'] = relativedelta(df['Today'], df['DOB']).years
Upvotes: 4
Views: 1537
Reputation: 29635
I don't think relativedelta
can accept pandas Series as parameters. The traceback show that the problem is when you the code behind relativedelta
tries to check the instance of the first parameter dt1
passed to relativedelta
, in your code being the Series df['Today']
. Then is raised the value error from pandas saying that it is ambiguous to check if a Series is of instance datetime.datetime
with isinstance
. As you did yourself, outside of a dataframe, it works because you pass directly datetime objects and not Series. So you could use apply
to calculate row-wise the difference between 2 datetime object
df['Age'] = df.apply(lambda x: relativedelta(x['Today'], x['DOB']).years, axis=1)
but I think the workaround you found is faster, while maybe not as precise as using relativedelta
Upvotes: 4