Dmitry
Dmitry

Reputation: 195

tzinfo in Pandas and datetime seems to be different. Is there a workaround?

I am trying to find a time difference between two datatimes. One is set from datetime and another one is read from a CSV file into a dataframe.

The CSV file:

,Timestamp,Value
1,2020-04-21 00:46:23,24.965867802122457

Actual code:

import pandas as pd
import numpy as np
from datetime import datetime, timezone

EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=timezone.utc)

df = pd.read_csv('./Out/bottom_clamp_pressure.csv', index_col = 0, header = 0)
df['Timestamp'] = df['Timestamp'].apply(pd.to_datetime, utc = True)

print(EPOCH)
print(df.loc[1, 'Timestamp'])

# Output:
# 1970-01-01 00:00:00+00:00
# 2020-04-21 00:46:23+00:00

print(EPOCH.tzinfo)
print(df.loc[1, 'Timestamp'].tzinfo)

# Output:
# UTC
# UTC

print(EPOCH.tzinfo == df.loc[1, 'Timestamp'].tzinfo)

# Output:
# False

print(df.loc[1, 'Timestamp'] - EPOCH)

# Output:
# TypeError: Timestamp subtraction must have the same timezones or no timezones

As you can see in the output above, both dates seems to have UTC timezone, at the same time, one time zone is not equal to another and subtraction of them does not work. Is there some work around that can allow me to get subtraction results?

Thanks!

Upvotes: 1

Views: 2249

Answers (1)

FObersteiner
FObersteiner

Reputation: 25554

pandas uses pytz's timezone model for UTC [src], which does not compare equal to the one used by the datetime module from the Python standard lib:

from datetime import datetime, timezone
import pandas as pd
import pytz

s = '2020-04-21 00:46:23'

t = pd.to_datetime(s, utc=True)
t.tzinfo
# <UTC>

d = datetime.fromisoformat(s).replace(tzinfo=timezone.utc)
d.tzinfo
# datetime.timezone.utc

t.tzinfo == d.tzinfo
# False

d = d.replace(tzinfo=pytz.utc)
t.tzinfo == d.tzinfo
# True

So a solution could be to use

EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)

Upvotes: 1

Related Questions