StephanH
StephanH

Reputation: 51

Python Pandas out of bounds datetime timestamp error for long dataframe

I have a data frame df with shape (500000,70) and several columns including invalid dates like 4000-01-01 00:00:00. In a smaller version of this data frame I tried

df["date"] = df["date"].astype(str)
df["date"] = df["date"].replace('4000-01-01 00:00:00', pd.NaT)

which worked fine. Also the version

df["date"] = pd.to_datetime(df["date"].replace("4000-01-01 00:00:00",pd.NaT))

worked. For the long data frame version I receive the following error

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00

Any suggestions how to solve this problem in an elegant way or what the problem might be?

Thank you.

Upvotes: 1

Views: 2229

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34086

The error is because:

In [332]: pd.Timestamp.max
Out[332]: Timestamp('2262-04-11 23:47:16.854775807')

The upper limit of the date is this. And your value is out of the range, hence OutOfBoundsError.

Upvotes: 1

jezrael
jezrael

Reputation: 863156

If add parameter errors='coerce' to to_datetime function it return NaT for all not parseable datetimes:

df["date"] = pd.to_datetime(df["date"], errors='coerce')

Upvotes: 2

Related Questions