Reputation: 497
I want to convert this Origin
column in the dataframe data_copy
to UTC datetime format
import pandas as pd
>>>data_copy["Origin"]
0 1669-06-04 00:00:00
1 1669-06-22 00:00:00
2 1720-07-15 00:00:00
3 1803-09-01 00:00:00
4 1816-05-26 00:00:00
6395 2020-03-29 18:27:36
6396 2020-03-29 18:47:53
6397 2020-03-29 20:05:19
6398 2020-03-30 02:19:27
6399 2020-03-30 06:11:36
There is also some data entries with 00:00:00
Time (I need to convert this also)
I tried this command
data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],infer_datetime_format=True)
But I am getting error like this
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\arrays\datetimes.py", line 2054, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas\_libs\tslibs\conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-93-aead2d23f264>", line 1, in <module>
data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],infer_datetime_format=True)
File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\tools\datetimes.py", line 803, in to_datetime
values = convert_listlike(arg._values, format)
File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\tools\datetimes.py", line 466, in _convert_listlike_datetimes
allow_object=True,
File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\arrays\datetimes.py", line 2059, in objects_to_datetime64ns
raise e
File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\arrays\datetimes.py", line 2050, in objects_to_datetime64ns
require_iso8601=require_iso8601,
File "pandas\_libs\tslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 574, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 570, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 546, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslibs\np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1669-06-04 00:00:00
How could I convert the column into UTC datetime format?
Upvotes: 1
Views: 1167
Reputation: 25564
If you need the datetimes anyway, you can use Python's datetime class. However, this will leave you with columns of dtype object, meaning that pandas' datetime functionality (dt accessor) is not available. Ex:
from datetime import datetime, timezone
import pandas as pd
s = (pd.Series(["1669-06-04 00:00:00", "1816-05-26 00:00:00", "2020-03-29 18:27:36"])
.apply(lambda t: datetime.fromisoformat(t).replace(tzinfo=timezone.utc)))
# s
# 0 1669-06-04 00:00:00+00:00
# 1 1816-05-26 00:00:00+00:00
# 2 2020-03-29 18:27:36+00:00
# dtype: object
You can still access the datetime class' methods but this then requires iteration (apply
).
Upvotes: 1
Reputation: 862731
Here is problem datetimes are outside limits in pandas link:
In [92]: pd.Timestamp.min
Out[92]: Timestamp('1677-09-21 00:12:43.145225')
In [93]: pd.Timestamp.max
Out[93]: Timestamp('2262-04-11 23:47:16.854775807')
Possible solution is replace values to NaT
by errors='coerce'
parameter:
data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],
infer_datetime_format=True,
errors='coerce')
Upvotes: 1