Stacey
Stacey

Reputation: 5097

Find time between dataframe date column and a reference date in Python

I have a data-frame df with a 'date of birth' column the head of which the head looks which looks like:

            DoB
0    19/12/1973
1    22/05/1969
2    23/09/1959
3    04/08/1963
4    25/03/1962

I am trying to create a new column which has the number of years between this column and a reference date variable called refDate which is set to: 2017-03-31 and has a type of <class 'datetime.date'>. I am looking for the folling new column in the df dataframe

Age
43
47
57
53
55

I have tried the follwing:

df['Age'] = df.loc[(refDate - ['Date of birth']).years]

but can't get a new column called Age. Where have I gone wrong?

Upvotes: 1

Views: 97

Answers (3)

Praveen
Praveen

Reputation: 9335

Use apply to get the date diff in years

df.DoB.apply(lambda x: (refDate - x).days // 365)

Demo

In [1]: df.DoB = pd.to_datetime(df.DoB)

In [2]: refDate = datetime(2017, 3, 31)

In [3]: df['age'] = df.DoB.apply(lambda  x: (refDate - x).days // 365)

In [4]: df
Out[4]:
         DoB  age
0 1973-12-19   43
1 1969-05-22   47
2 1959-09-23   57
3 1963-04-08   54
4 1962-03-25   55

Upvotes: 0

Space Impact
Space Impact

Reputation: 13255

Your line for df['Age'] is not correct, Use the following to get the desired output, First substract df['DoB'] from refDate then divide with timedelta for conversion of days to years as :

df['DoB'] = pd.to_datetime(df['DoB'])
df['Age'] = (pd.to_datetime('2017-03-31') - df['DoB'])/np.timedelta64(1, 'Y')

print(df)
         DoB        Age
0 1973-12-19  43.280834
1 1969-05-22  47.858614
2 1959-09-23  57.520688
3 1963-04-08  53.980575
4 1962-03-25  55.018241

Or If you want years in integers then:

df['Age'] = ((pd.to_datetime('2017-03-31') - df['DoB'])/np.timedelta64(1, 'Y')).astype(int)
print(df)
         DoB  Age
0 1973-12-19   43
1 1969-05-22   47
2 1959-09-23   57
3 1963-04-08   53
4 1962-03-25   55

Upvotes: 3

Michael Dorner
Michael Dorner

Reputation: 20125

Do not use .loc. With .loc you access a group of rows and columns by label(s) or a boolean array.

Upvotes: 2

Related Questions