Reputation: 51
I have a data frame df
with shape (500000,70)
and several columns including invalid dates like 4000-01-01 00:00:00
. In a smaller version of this data frame I tried
df["date"] = df["date"].astype(str)
df["date"] = df["date"].replace('4000-01-01 00:00:00', pd.NaT)
which worked fine. Also the version
df["date"] = pd.to_datetime(df["date"].replace("4000-01-01 00:00:00",pd.NaT))
worked. For the long data frame version I receive the following error
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00
Any suggestions how to solve this problem in an elegant way or what the problem might be?
Thank you.
Upvotes: 1
Views: 2229
Reputation: 34086
The error is because:
In [332]: pd.Timestamp.max
Out[332]: Timestamp('2262-04-11 23:47:16.854775807')
The upper limit of the date is this. And your value is out of the range, hence OutOfBoundsError.
Upvotes: 1
Reputation: 863156
If add parameter errors='coerce'
to to_datetime
function it return NaT
for all not parseable datetimes:
df["date"] = pd.to_datetime(df["date"], errors='coerce')
Upvotes: 2