bilal
bilal

Reputation: 55

How to fix OverflowError: Overflow in int64 addition

I'm trying to subtract column df['date_of_admission'] from the column df['DOB'] to find the difference between then and store the age value in df['age'] column, however, I'm getting this error:

OverflowError: Overflow in int64 addition

 DOB          date_of_admission      age
 2000-05-07   2019-01-19 12:26:00        
 1965-01-30   2019-03-21 02:23:12        
 NaT          2018-11-02 18:30:10        
 1981-05-01   2019-05-08 12:26:00       
 1957-01-10   2018-12-31 04:01:15         
 1968-07-14   2019-01-28 15:05:09            
 NaT          2018-04-13 06:20:01 
 NaT          2019-02-15 01:01:57 
 2001-02-10   2019-03-21 08:22:00       
 1990-03-29   2018-11-29 03:05:03
.....         ......
.....         .....
.....         .....

I've tried it with the following:

import numpy as np
import pandas as pd
from datetime import dt

df['age'] = (df['date_of_admission'] - df['DOB']).dt.days // 365

Expected to get the following age column after finding the difference between:

age
26
69
NaN
58
.
.
.

Upvotes: 3

Views: 13372

Answers (7)

hmadinei
hmadinei

Reputation: 41

None of the above answers worked for me.

Based on the answer here, try following code:

df['AGE'] = df.apply(lambda s: (s['date_of_admission'].to_pydatetime()-s['DOB'].to_pydatetime()).days / 365.0, axis=1)`

I should also mention that I used pandas==1.5

Upvotes: 1

little_thumb
little_thumb

Reputation: 41

For those getting errors AttributeError: Can only use .dt accessor with datetimelike values from @tawab_shakeel's answer, try following code

import pandas as pd

df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

df['DOB'] = pd.to_datetime(df['DOB']).dt.date

df['age'] = ((df['date_of_admission']-df['DOB']) //365).dt.days

It will return int value for age

Upvotes: 0

Hao Wu
Hao Wu

Reputation: 11

I think we all meet this problem dealing MIMIC3 database, here is my solution, very intuitive, but much faster, you can have a try.

df_labevents_temp['age']  = ((df['date_of_admission'].values  - df['DOB'].values).astype(np.int)/8.64e13//365).astype(np.int)

The idea is to turn it to numpy int, then convert from ns to years. The pandas date diff function doesn't work for me, using apply is too slow (I think just loop).

Upvotes: 1

laxolotl
laxolotl

Reputation: 1

I'm using the MIMIC III dataset and encountered this problem as well. I found that the comment by @Rahul_chacharan to use df['date_of_admission'].subtract(df['DOB']).dt.days // 365 worked for me!

Upvotes: 0

DieterDP
DieterDP

Reputation: 4347

OP is most likely using the medical MIMIC dataset, where dates have been scrambled to protect the identities of patients. Specifically, for patients over age 89, they shifted the date of birth by 300 years.

Long timespans such as this give an overflow when using pandas timedelta:

pd.to_timedelta(300, unit="Y", box=False)
> numpy.timedelta64(-8979658473709551616,'ns')

When this occurs in a dataframe operation, you'll experience an error instead. As adapted from the answer of @tawab_shakeel :

df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','1700-01-01'],
                   "date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2000-01-01 02:23:23"]})

df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

# Gives AttributeError: Can only use .dt accessor with datetimelike values
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

# Gives OverflowError: long too big to convert
pd.to_timedelta(df['date_of_admission']-df['DOB'])

This problem occurs with any calculation that converts to a timedelta64[ns] datatype.

As a workaround, you can use an apply operation instead, directly calculating the age element per element:

df['age'] = df.apply(lambda e: (e['date_of_admission'] - e['DOB']).days/365, axis=1)

Upvotes: 9

Rahul charan
Rahul charan

Reputation: 837

1). You are doing it correctly but the DOB contains the only date AND date_of_admission contains both date and time. Manipulate the date_of_admission so that it will only contain the date, then you will get your result.

2). Here I am adding a change function into your code so that you will get your result.

import numpy as np
import pandas as pd
from datetime import dt

def change(x):
    return x.date()

df['date_of_admission'] = df['date_of_admission'].apply(change)

df['age'] = df['date_of_admission'].subtract(df['DOB']).dt.days // 365

I hope it will help you.

Upvotes: 1

tawab_shakeel
tawab_shakeel

Reputation: 3739

Convert both columns into date then subtract it

import pandas as pd


df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

df['DOB'] = pd.to_datetime(df['DOB']).dt.date

df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

SECOND TEST

#Now I have use DOB AND date_of_admission data from the question and it is working fine

df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','NaT'],
                   "date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2018-11-02 18:30:10"]})

df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

RESULT:

DOB       date_of_admission   age
2000-05-07  2019-01-19       18.0
1965-01-30  2019-03-21       54.0
NaT         2018-11-02       NaN

Upvotes: 3

Related Questions