Reputation: 418
I was working with the Crunchbase dataset. I have an entry of Harvard University which was founded in 1636. This entry is giving me an error when I am trying to convert string to DateTime.
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1636-09-08 00:00:00
I found out that pandas support timestamp from 1677
>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')
I checked out some solutions like one suggesting using errors='coerce'
but dropping this entry/ making it null is not an option.
Can you please suggest a way to handle this issue?
Upvotes: 2
Views: 1133
Reputation: 313
As mentioned in comments by Henry, there is limitation of pandas timestamps because of its representation in float64, you could probably work around it by parsing the date-time using datetime library when needed, otherwise letting it stay as string or convert it to an integer
Scenario 1: If you plan on showing this value only when you print it
datetime_object = datetime.strptime('1636-09-08 00:00:00', '%Y-%m-%d %H:%M:%S')
Scenario 2: If you want to use it as a date column to retain information in the dataframe, you could additionally
datetime_object.strftime("%Y%m%d%H%M%S")
using it on a column in a pandas dataframe would yield this
df=pd.DataFrame([['1636-09-08 00:00:00'],['1635-09-09 00:00:00']], columns=['dates'])
df['str_date']=df['dates'].apply(lambda x:datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
df.head()
dates | str_date | |
---|---|---|
0 | 1636-09-08 00:00:00 | 1636-09-08 00:00:00 |
1 | 1635-09-09 00:00:00 | 1635-09-09 00:00:00 |
pandas treats this column as a object column, but when you access it, it is a datetime column
df['str_date'][0]
>>datetime.datetime(1636, 9, 8, 0, 0)
also, adding this for the sake of completeness: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-oob
Upvotes: 3