Reputation: 195
I am trying to find a time difference between two datatimes. One is set from datetime and another one is read from a CSV file into a dataframe.
The CSV file:
,Timestamp,Value
1,2020-04-21 00:46:23,24.965867802122457
Actual code:
import pandas as pd
import numpy as np
from datetime import datetime, timezone
EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=timezone.utc)
df = pd.read_csv('./Out/bottom_clamp_pressure.csv', index_col = 0, header = 0)
df['Timestamp'] = df['Timestamp'].apply(pd.to_datetime, utc = True)
print(EPOCH)
print(df.loc[1, 'Timestamp'])
# Output:
# 1970-01-01 00:00:00+00:00
# 2020-04-21 00:46:23+00:00
print(EPOCH.tzinfo)
print(df.loc[1, 'Timestamp'].tzinfo)
# Output:
# UTC
# UTC
print(EPOCH.tzinfo == df.loc[1, 'Timestamp'].tzinfo)
# Output:
# False
print(df.loc[1, 'Timestamp'] - EPOCH)
# Output:
# TypeError: Timestamp subtraction must have the same timezones or no timezones
As you can see in the output above, both dates seems to have UTC timezone, at the same time, one time zone is not equal to another and subtraction of them does not work. Is there some work around that can allow me to get subtraction results?
Thanks!
Upvotes: 1
Views: 2249
Reputation: 25554
pandas
uses pytz
's timezone model for UTC [src], which does not compare equal to the one used by the datetime
module from the Python standard lib:
from datetime import datetime, timezone
import pandas as pd
import pytz
s = '2020-04-21 00:46:23'
t = pd.to_datetime(s, utc=True)
t.tzinfo
# <UTC>
d = datetime.fromisoformat(s).replace(tzinfo=timezone.utc)
d.tzinfo
# datetime.timezone.utc
t.tzinfo == d.tzinfo
# False
d = d.replace(tzinfo=pytz.utc)
t.tzinfo == d.tzinfo
# True
So a solution could be to use
EPOCH = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
Upvotes: 1