pandas raising OutOfBoundsDatetime on csv but not on sql

Question

I have one service running pandas version 0.25.2. This service reads data from a database and stores a snapshot as csv

df = pd.read_sql_query(sql_cmd, oracle)

the query result in a dataframe with some very large datetime values. (e.g. 3000-01-02 00:00:00) Afterwards I use df.to_csv(index=False) to create a csv snapshot and write it into a file

on a diffrent machine with pandas 0.25.3 installed, i am reading the content of the csv file into a dataframe and try to change the datatype of the date column to datetime. This results in a OutOfBoundsDatetime Exception

df = pd.read_csv("xy.csv")
pd.to_datetime(df['val_until'])

pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 3000-01-02 00:00:00

I am thinking about using pickle to create the snapshots an load the dataframes directly. However, I am curious why pandas is able to handle a big datetime in the first case and not in the second one. Also any suggestions how I keep using csv as transfer format are appreciated

topher217 · Accepted Answer

I believe I got it.

In the first case, I'm not sure what the actual data type is that is stored in the sql database, but if not otherwise specified, reading it into the df likely results in some generic or string type which has a much higher overflow value.

Eventually though, it ends up in a csv file which is a string type. This can be incredibly (infinitely?) long without any overflow, whereas the data type you are trying to cast into using pandas.to_datetime docs. has a maximum value of _'2262-04-11 23:47:16.854775807' according to the Timestamp.max shown in the first doc link at the bottom.

pandas raising OutOfBoundsDatetime on csv but not on sql

Answers (1)

Related Questions