Reputation: 5097
I have a data-frame df
with a 'date of birth'
column the head of which the head looks which looks like:
DoB
0 19/12/1973
1 22/05/1969
2 23/09/1959
3 04/08/1963
4 25/03/1962
I am trying to create a new column which has the number of years between this column and a reference date variable called refDate which is set to: 2017-03-31
and has a type of <class 'datetime.date'>
. I am looking for the folling new column in the df dataframe
Age
43
47
57
53
55
I have tried the follwing:
df['Age'] = df.loc[(refDate - ['Date of birth']).years]
but can't get a new column called Age
. Where have I gone wrong?
Upvotes: 1
Views: 97
Reputation: 9335
Use apply to get the date diff in years
df.DoB.apply(lambda x: (refDate - x).days // 365)
Demo
In [1]: df.DoB = pd.to_datetime(df.DoB)
In [2]: refDate = datetime(2017, 3, 31)
In [3]: df['age'] = df.DoB.apply(lambda x: (refDate - x).days // 365)
In [4]: df
Out[4]:
DoB age
0 1973-12-19 43
1 1969-05-22 47
2 1959-09-23 57
3 1963-04-08 54
4 1962-03-25 55
Upvotes: 0
Reputation: 13255
Your line for df['Age']
is not correct, Use the following to get the desired output, First substract df['DoB']
from refDate
then divide with timedelta
for conversion of days to years as :
df['DoB'] = pd.to_datetime(df['DoB'])
df['Age'] = (pd.to_datetime('2017-03-31') - df['DoB'])/np.timedelta64(1, 'Y')
print(df)
DoB Age
0 1973-12-19 43.280834
1 1969-05-22 47.858614
2 1959-09-23 57.520688
3 1963-04-08 53.980575
4 1962-03-25 55.018241
Or If you want years in integers then:
df['Age'] = ((pd.to_datetime('2017-03-31') - df['DoB'])/np.timedelta64(1, 'Y')).astype(int)
print(df)
DoB Age
0 1973-12-19 43
1 1969-05-22 47
2 1959-09-23 57
3 1963-04-08 53
4 1962-03-25 55
Upvotes: 3
Reputation: 20125
Do not use .loc
. With .loc
you access a group of rows and columns by label(s) or a boolean array.
Upvotes: 2