newbie
newbie

Reputation: 391

Unable to get time difference between to pandas dataframe columns

I have a pandas dataframe that contains a couple of columns. Two of which are start_time and end_time. In those columns the values look like - 2020-01-04 01:38:33 +0000 UTC

I am not able to create a datetime object from these strings because I am not able to get the format right -

df['start_time'] = pd.to_datetime(df['start_time'], format="yyyy-MM-dd HH:mm:ss +0000 UTC")

I also tried using yyyy-MM-dd HH:mm:ss %z UTC as a format

This gives the error -

ValueError: time data '2020-01-04 01:38:33 +0000 UTC' does not match format 'yyyy-MM-dd HH:mm:ss +0000 UTC' (match)

Upvotes: 1

Views: 240

Answers (2)

loginmind
loginmind

Reputation: 603

There are some notes below about this problem:

1. About your error

This gives the error -

You have parsed a wrong datetime format that will cause the error. For correct format check this one https://strftime.org/. Correct format for this problem would be: "%Y-%m-%d %H:%M:%S %z UTC"

2. Pandas limitation with timezone

Parsing UTC timezone as %z doesn't working on pd.Series (it only works on index value). So if you use this, it will not work:

df['startTime'] = pd.to_datetime(df.startTime, format="%Y-%m-%d %H:%M:%S %z UTC", utc=True)

Solution for this is using python built-in library for inferring the datetime data:

from datetime import datetime
f = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S %z UTC")
df['startTime'] = pd.to_datetime(df.startTime.apply(f), utc=True)

@fmarm answer only help you dealing with date and hour data, not UTC timezone.

Upvotes: 0

fmarm
fmarm

Reputation: 4284

You just need to use the proper timestamp format that to_datetime will recognize

df['start_time'] = pd.to_datetime(df['start_time'], format="%Y-%m-%d %H:%M:%S +0000 UTC")

Upvotes: 1

Related Questions