Reputation: 23
I have a pandas dataframe which has a column 'dob' (date of birth), I wish to compute the age based on today's date
I have used datetime module to invoke today, and subtract the 'dob' field from today, and divide this by 365 to get the age in years.
This is a rather crude approach I concede, I am looking for hints to do it more elegantly.
# -*- coding: utf-8 -*-
import pandas as pd
from datetime import datetime
today = datetime.today()
df = pd.read_csv(pathtocsvfile, parse_dates=['dob'])
df['age'] = df['dob'].apply(lambda x: (today - x).days // 365)
I believe the code is working as it is, however I am not sure how much leap years can influence the result.
And I am looking for an elegant way to do this as well.
Upvotes: 1
Views: 8215
Reputation: 21
To calculate age apply following algorithm:
In code:
dob = '17-12-1965'
dob_date = datetime.strptime(dob, '%d-%m-%Y')
now_date = datetime.today()
age = int(
(now_date.year*10000 + now_date.month*100+now_date.day) - \
(dob_date.year*10000 + dob_date.month*100+dob_date.day)
) / 10000)
Upvotes: 2
Reputation: 341
I would suggest this if you want the age
df['age'] = df['dob'].apply(
lambda x: today.year - x.year -
((today.month, today.day) < (x.month, x.day))
)
rather than taking days and dividing by 365 which is not always accurate and could lead to a wrong result.
This reflects the topic, which is discussed also here: Age from birthdate in python
Upvotes: 7
Reputation: 16942
You are introducing inaccuracy by insisting on counting in years. Your purpose will just as well be served by an age in days, which you already have. Just drop the integer division by 365.
Upvotes: 0