Kantajit
Kantajit

Reputation: 418

How to handle dates which is out of timestamp range in pandas?

I was working with the Crunchbase dataset. I have an entry of Harvard University which was founded in 1636. This entry is giving me an error when I am trying to convert string to DateTime.

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1636-09-08 00:00:00

I found out that pandas support timestamp from 1677

>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')

I checked out some solutions like one suggesting using errors='coerce' but dropping this entry/ making it null is not an option. Can you please suggest a way to handle this issue?

Upvotes: 2

Views: 1133

Answers (1)

Harsh Sharma
Harsh Sharma

Reputation: 313

As mentioned in comments by Henry, there is limitation of pandas timestamps because of its representation in float64, you could probably work around it by parsing the date-time using datetime library when needed, otherwise letting it stay as string or convert it to an integer

Scenario 1: If you plan on showing this value only when you print it

datetime_object = datetime.strptime('1636-09-08 00:00:00', '%Y-%m-%d %H:%M:%S')

Scenario 2: If you want to use it as a date column to retain information in the dataframe, you could additionally

datetime_object.strftime("%Y%m%d%H%M%S")

using it on a column in a pandas dataframe would yield this

df=pd.DataFrame([['1636-09-08 00:00:00'],['1635-09-09 00:00:00']], columns=['dates'])
df['str_date']=df['dates'].apply(lambda x:datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
df.head()
dates str_date
0 1636-09-08 00:00:00 1636-09-08 00:00:00
1 1635-09-09 00:00:00 1635-09-09 00:00:00

pandas treats this column as a object column, but when you access it, it is a datetime column

df['str_date'][0]
>>datetime.datetime(1636, 9, 8, 0, 0)

also, adding this for the sake of completeness: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-oob

Upvotes: 3

Related Questions