VGB
VGB

Reputation: 497

How to convert dataframe column into UTC datetime format?

I want to convert this Origin column in the dataframe data_copy to UTC datetime format

import pandas as pd

>>>data_copy["Origin"]
 
0       1669-06-04 00:00:00
1       1669-06-22 00:00:00
2       1720-07-15 00:00:00
3       1803-09-01 00:00:00
4       1816-05-26 00:00:00
        
6395    2020-03-29 18:27:36
6396    2020-03-29 18:47:53
6397    2020-03-29 20:05:19
6398    2020-03-30 02:19:27
6399    2020-03-30 06:11:36

There is also some data entries with 00:00:00 Time (I need to convert this also) I tried this command data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],infer_datetime_format=True) But I am getting error like this

Traceback (most recent call last):

  File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\arrays\datetimes.py", line 2054, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)

  File "pandas\_libs\tslibs\conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64

TypeError: Unrecognized value type: <class 'str'>


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "<ipython-input-93-aead2d23f264>", line 1, in <module>
    data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],infer_datetime_format=True)

  File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\tools\datetimes.py", line 803, in to_datetime
    values = convert_listlike(arg._values, format)

  File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\tools\datetimes.py", line 466, in _convert_listlike_datetimes
    allow_object=True,

  File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\arrays\datetimes.py", line 2059, in objects_to_datetime64ns
    raise e

  File "C:\ProgramData\Anaconda3\envs\roses\lib\site-packages\pandas\core\arrays\datetimes.py", line 2050, in objects_to_datetime64ns
    require_iso8601=require_iso8601,

  File "pandas\_libs\tslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslib.pyx", line 574, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslib.pyx", line 570, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslib.pyx", line 546, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslibs\np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1669-06-04 00:00:00

How could I convert the column into UTC datetime format?

Upvotes: 1

Views: 1167

Answers (2)

FObersteiner
FObersteiner

Reputation: 25564

If you need the datetimes anyway, you can use Python's datetime class. However, this will leave you with columns of dtype object, meaning that pandas' datetime functionality (dt accessor) is not available. Ex:

from datetime import datetime, timezone
import pandas as pd

s = (pd.Series(["1669-06-04 00:00:00", "1816-05-26 00:00:00", "2020-03-29 18:27:36"])
                 .apply(lambda t: datetime.fromisoformat(t).replace(tzinfo=timezone.utc)))

# s
# 0    1669-06-04 00:00:00+00:00
# 1    1816-05-26 00:00:00+00:00
# 2    2020-03-29 18:27:36+00:00
# dtype: object

You can still access the datetime class' methods but this then requires iteration (apply).

Upvotes: 1

jezrael
jezrael

Reputation: 862731

Here is problem datetimes are outside limits in pandas link:

In [92]: pd.Timestamp.min
Out[92]: Timestamp('1677-09-21 00:12:43.145225')

In [93]: pd.Timestamp.max
Out[93]: Timestamp('2262-04-11 23:47:16.854775807')

Possible solution is replace values to NaT by errors='coerce' parameter:

data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],
                                     infer_datetime_format=True, 
                                     errors='coerce')

Upvotes: 1

Related Questions