Reputation: 77
I have a df where one of the column is a date with a datetime64[ns] type.
Over this column I want to add months using another column of the dataframe as a base:
df['date_shifted']=df['date'].values.astype('datetime64[M]')+(df['months']).values.astype('timedelta64[M]')
My problem comes when I exceed the maximum of the datetime64 type and I retrieve the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2846-04-30 00:00:00
Are there any way to work around this error and add the months that I need to my dataframe?
Some example of possible data which would be an error after the calculation:
date | months |
---|---|
28-01-2017 | 9999 |
13-05-2018 | 9999 |
22-03-2016 | 9999 |
05-12-2007 | 9999 |
Note: I know that I can coerce the errors to NaT but I need the dates for following calculations.
Upvotes: 0
Views: 105
Reputation: 11395
You can use Periods, as per the Representing out-of-bounds spans section of the guide on timestamps posted by @HenryEcker in comments. To convert the column simply use .dt.to_period()
:
>>> df['date'].dt.to_period(freq='M')
0 2017-01
1 2018-05
2 2016-03
3 2007-05
Name: date, dtype: period[M]
The rest is easy, adding the int64
months can even be done without conversion:
>>> df['shifted_date'] = df['date'].dt.to_period(freq='M') + df['months']
>>> df
date months shifted_date
0 2017-01-28 9999 2850-04
1 2018-05-13 9999 2851-08
2 2016-03-22 9999 2849-06
3 2007-05-12 9999 2840-08
>>> df['shifted_date']
0 2850-04
1 2851-08
2 2849-06
3 2840-08
Name: shifted_date, dtype: period[M]
Based on the dates you have you could use a smaller granularity period:
>>> df['shifted_date'].astype('Period[D]')
0 2850-04-30
1 2851-08-31
2 2849-06-30
3 2840-08-31
Name: shifted_date, dtype: period[D]
Going back to datetimes would trigger the overflow you’re trying to avoid:
>>> df['shifted_date'].dt.start_time
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.8/site-packages/pandas/core/accessor.py", line 78, in _getter
return self._delegate_property_get(name)
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/accessors.py", line 70, in _delegate_property_get
result = getattr(values, name)
File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 420, in start_time
return self.to_timestamp(how="start")
File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 465, in to_timestamp
new_data = libperiod.periodarr_to_dt64arr(new_data.asi8, base)
File "pandas/_libs/tslibs/period.pyx", line 977, in pandas._libs.tslibs.period.periodarr_to_dt64arr
File "pandas/_libs/tslibs/conversion.pyx", line 246, in pandas._libs.tslibs.conversion.ensure_datetime64ns
File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2850-04-01 00:00:00
Upvotes: 1