Reputation: 55
I'm trying to subtract column df['date_of_admission']
from the column df['DOB']
to find the difference between then and store the age value in df['age']
column, however, I'm getting this error:
OverflowError: Overflow in int64 addition
DOB date_of_admission age
2000-05-07 2019-01-19 12:26:00
1965-01-30 2019-03-21 02:23:12
NaT 2018-11-02 18:30:10
1981-05-01 2019-05-08 12:26:00
1957-01-10 2018-12-31 04:01:15
1968-07-14 2019-01-28 15:05:09
NaT 2018-04-13 06:20:01
NaT 2019-02-15 01:01:57
2001-02-10 2019-03-21 08:22:00
1990-03-29 2018-11-29 03:05:03
..... ......
..... .....
..... .....
I've tried it with the following:
import numpy as np
import pandas as pd
from datetime import dt
df['age'] = (df['date_of_admission'] - df['DOB']).dt.days // 365
Expected to get the following age column after finding the difference between:
age
26
69
NaN
58
.
.
.
Upvotes: 3
Views: 13372
Reputation: 41
None of the above answers worked for me.
Based on the answer here, try following code:
df['AGE'] = df.apply(lambda s: (s['date_of_admission'].to_pydatetime()-s['DOB'].to_pydatetime()).days / 365.0, axis=1)`
I should also mention that I used pandas==1.5
Upvotes: 1
Reputation: 41
For those getting errors AttributeError: Can only use .dt accessor with datetimelike values
from @tawab_shakeel's answer, try following code
import pandas as pd
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['age'] = ((df['date_of_admission']-df['DOB']) //365).dt.days
It will return int value for age
Upvotes: 0
Reputation: 11
I think we all meet this problem dealing MIMIC3 database, here is my solution, very intuitive, but much faster, you can have a try.
df_labevents_temp['age'] = ((df['date_of_admission'].values - df['DOB'].values).astype(np.int)/8.64e13//365).astype(np.int)
The idea is to turn it to numpy int, then convert from ns to years. The pandas date diff function doesn't work for me, using apply is too slow (I think just loop).
Upvotes: 1
Reputation: 1
I'm using the MIMIC III dataset and encountered this problem as well. I found that the comment by @Rahul_chacharan to use df['date_of_admission'].subtract(df['DOB']).dt.days // 365
worked for me!
Upvotes: 0
Reputation: 4347
OP is most likely using the medical MIMIC dataset, where dates have been scrambled to protect the identities of patients. Specifically, for patients over age 89, they shifted the date of birth by 300 years.
Long timespans such as this give an overflow when using pandas timedelta:
pd.to_timedelta(300, unit="Y", box=False)
> numpy.timedelta64(-8979658473709551616,'ns')
When this occurs in a dataframe operation, you'll experience an error instead. As adapted from the answer of @tawab_shakeel :
df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','1700-01-01'],
"date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2000-01-01 02:23:23"]})
df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
# Gives AttributeError: Can only use .dt accessor with datetimelike values
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365
# Gives OverflowError: long too big to convert
pd.to_timedelta(df['date_of_admission']-df['DOB'])
This problem occurs with any calculation that converts to a timedelta64[ns]
datatype.
As a workaround, you can use an apply
operation instead, directly calculating the age element per element:
df['age'] = df.apply(lambda e: (e['date_of_admission'] - e['DOB']).days/365, axis=1)
Upvotes: 9
Reputation: 837
1). You are doing it correctly but the DOB
contains the only date AND date_of_admission
contains both date and time. Manipulate the date_of_admission
so that it will only contain the date, then you will get your result.
2). Here I am adding a change function
into your code so that you will get your result.
import numpy as np
import pandas as pd
from datetime import dt
def change(x):
return x.date()
df['date_of_admission'] = df['date_of_admission'].apply(change)
df['age'] = df['date_of_admission'].subtract(df['DOB']).dt.days // 365
I hope it will help you.
Upvotes: 1
Reputation: 3739
Convert both columns into date then subtract it
import pandas as pd
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365
SECOND TEST
#Now I have use DOB AND date_of_admission data from the question and it is working fine
df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','NaT'],
"date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2018-11-02 18:30:10"]})
df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365
RESULT:
DOB date_of_admission age
2000-05-07 2019-01-19 18.0
1965-01-30 2019-03-21 54.0
NaT 2018-11-02 NaN
Upvotes: 3