Reputation: 391
I have a pandas dataframe that contains a couple of columns. Two of which are start_time and end_time. In those columns the values look like - 2020-01-04 01:38:33 +0000 UTC
I am not able to create a datetime object from these strings because I am not able to get the format right -
df['start_time'] = pd.to_datetime(df['start_time'], format="yyyy-MM-dd HH:mm:ss +0000 UTC")
I also tried using yyyy-MM-dd HH:mm:ss %z UTC
as a format
This gives the error -
ValueError: time data '2020-01-04 01:38:33 +0000 UTC' does not match format 'yyyy-MM-dd HH:mm:ss +0000 UTC' (match)
Upvotes: 1
Views: 240
Reputation: 603
There are some notes below about this problem:
1. About your error
This gives the error -
You have parsed a wrong datetime format that will cause the error. For correct format check this one https://strftime.org/. Correct format for this problem would be: "%Y-%m-%d %H:%M:%S %z UTC"
2. Pandas limitation with timezone
Parsing UTC timezone as %z
doesn't working on pd.Series (it only works on index value). So if you use this, it will not work:
df['startTime'] = pd.to_datetime(df.startTime, format="%Y-%m-%d %H:%M:%S %z UTC", utc=True)
Solution for this is using python built-in library for inferring the datetime data:
from datetime import datetime
f = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S %z UTC")
df['startTime'] = pd.to_datetime(df.startTime.apply(f), utc=True)
@fmarm answer only help you dealing with date and hour data, not UTC timezone.
Upvotes: 0
Reputation: 4284
You just need to use the proper timestamp format that to_datetime
will recognize
df['start_time'] = pd.to_datetime(df['start_time'], format="%Y-%m-%d %H:%M:%S +0000 UTC")
Upvotes: 1