Reputation: 43
from datetime import datetime
import pandas as pd
date="2020-02-07T16:05:16.000000000"
#Convert using datetime
t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f')
#Convert using Pandas
t2=pd.to_datetime(date)
#Subtract the dates
print(t1-t2)
#subtract the date timestamps
print(t1.timestamp()-t2.timestamp())
In this example, my understanding is that both datetime and pandas should use timezone naive dates. Can anyone explain why the difference between the dates is zero, but the difference between the timestamps is not zero? It's off by 5 hours for me, which is my time zone offset from GMT.
Upvotes: 4
Views: 1279
Reputation: 25544
Naive datetime objects of Python's datetime.datetime
class represent local time. This is kind of obvious from the docs but can be a brain-teaser to work with nevertheless. If you call the timestamp
method on it, the returned POSIX timestamp refers to UTC (seconds since the epoch) as it should.
Coming from the Python datetime object, the behavior of a naive pandas.Timestamp
can be counter-intuitive (and I think it's not so obvious). Derived the same way from a tz-naive string, it doesn't represent local time but UTC. You can verify that by localizing the datetime
object to UTC:
from datetime import datetime, timezone
import pandas as pd
date = "2020-02-07T16:05:16.000000000"
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
t2 = pd.to_datetime(date)
print(t1.replace(tzinfo=timezone.utc).timestamp() - t2.timestamp())
# 0.0
The other way around you can make the pandas.Timestamp
timezone-aware, e.g.
t3 = pd.to_datetime(t1.astimezone())
# e.g. Timestamp('2020-02-07 16:05:16+0100', tz='Mitteleuropäische Zeit')
# now both t1 and t3 represent my local time:
print(t1.timestamp() - t3.timestamp())
# 0.0
My bottom line is that if you know that the timestamps you have represent a certain timezone, work with timezone-aware datetime, e.g. for UTC
import pytz # need to use pytz here since pandas uses that internally
t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=pytz.UTC)
t2 = pd.to_datetime(date, utc=True)
print(t1 == t2)
# True
print(t1-t2)
# 0 days 00:00:00
print(t1.timestamp()-t2.timestamp())
# 0.0
Upvotes: 1