Dan
Dan

Reputation: 43

Python datetime and pandas give different timestamps for the same date

from datetime import datetime
import pandas as pd

date="2020-02-07T16:05:16.000000000"

#Convert using datetime
t1=datetime.strptime(date[:-3],'%Y-%m-%dT%H:%M:%S.%f')

#Convert using Pandas
t2=pd.to_datetime(date)

#Subtract the dates
print(t1-t2)

#subtract the date timestamps
print(t1.timestamp()-t2.timestamp())

In this example, my understanding is that both datetime and pandas should use timezone naive dates. Can anyone explain why the difference between the dates is zero, but the difference between the timestamps is not zero? It's off by 5 hours for me, which is my time zone offset from GMT.

Upvotes: 4

Views: 1279

Answers (1)

FObersteiner
FObersteiner

Reputation: 25544

Naive datetime objects of Python's datetime.datetime class represent local time. This is kind of obvious from the docs but can be a brain-teaser to work with nevertheless. If you call the timestamp method on it, the returned POSIX timestamp refers to UTC (seconds since the epoch) as it should.

Coming from the Python datetime object, the behavior of a naive pandas.Timestamp can be counter-intuitive (and I think it's not so obvious). Derived the same way from a tz-naive string, it doesn't represent local time but UTC. You can verify that by localizing the datetime object to UTC:

from datetime import datetime, timezone
import pandas as pd

date = "2020-02-07T16:05:16.000000000"

t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f')
t2 = pd.to_datetime(date)

print(t1.replace(tzinfo=timezone.utc).timestamp() - t2.timestamp())
# 0.0

The other way around you can make the pandas.Timestamp timezone-aware, e.g.

t3 = pd.to_datetime(t1.astimezone())
# e.g. Timestamp('2020-02-07 16:05:16+0100', tz='Mitteleuropäische Zeit')

# now both t1 and t3 represent my local time:
print(t1.timestamp() - t3.timestamp())
# 0.0

My bottom line is that if you know that the timestamps you have represent a certain timezone, work with timezone-aware datetime, e.g. for UTC

import pytz # need to use pytz here since pandas uses that internally

t1 = datetime.strptime(date[:-3], '%Y-%m-%dT%H:%M:%S.%f').replace(tzinfo=pytz.UTC)
t2 = pd.to_datetime(date, utc=True)

print(t1 == t2)
# True
print(t1-t2)
# 0 days 00:00:00
print(t1.timestamp()-t2.timestamp())
# 0.0

Upvotes: 1

Related Questions