asmaier
asmaier

Reputation: 11776

How to handle timeseries with variable timezone offset due to daylight savings time in Python Pandas?

Recently my Python Pandas code broke. The problem was the creation of a DateTimeIndex from a list of timestamps that used to work in the past. My first thought was that somehow my timestamps have an invalid format, but after some debugging I noted that until entry number 102 in my list, the conversion into a DateTimeIndex worked

>>> pd.to_datetime(df.created.head(n=102))
datetime
...
2019-10-28 15:16:24+01:00   2019-10-28 15:16:24+01:00
2019-10-28 14:50:54+01:00   2019-10-28 14:50:54+01:00
Name: created, Length: 102, dtype: datetime64[ns, pytz.FixedOffset(60)]

But the next entry in my list silently breaks the creation of the DateTimeIndex. Pandas just returns a list of dtype: object instead:

>>> pd.to_datetime(df.created.head(n=103))
datetime
...
2019-10-28 15:16:24+01:00    2019-10-28 15:16:24+01:00
2019-10-28 14:50:54+01:00    2019-10-28 14:50:54+01:00
2019-10-25 15:06:57+02:00    2019-10-25 15:06:57+02:00
Name: created, Length: 103, dtype: object

After staring at the timestamps I finally noted that the problem must be the different timezone offset due to the end of daylight savings time. A minimal example to reproduce the problem is:

# works
>>> import pandas as pd
>>> pd.to_datetime(["2019-11-20 11:46:32+01:00"])
DatetimeIndex(['2019-11-20 11:46:32+01:00'], dtype='datetime64[ns, pytz.FixedOffset(60)]', freq=None)
# works
>>> pd.to_datetime(["2019-10-25T15:06:57.000+0200"])
DatetimeIndex(['2019-10-25 15:06:57+02:00'], dtype='datetime64[ns, pytz.FixedOffset(120)]', freq=None)
# doesn't work 
>>> pd.to_datetime(["2019-11-20 11:46:32+01:00", "2019-10-25T15:06:57.000+0200"])
Index([2019-11-20 11:46:32+01:00, 2019-10-25 15:06:57+02:00], dtype='object')

Is this a bug or is this a feature of Pandas? Why does pandas not give a warning or error, that it cannot convert the timestamps into a DateTimeIndex. And how can I fix my code, so that my timestamps with variable timezone offset are converted successfully into a DateTimeIndex?

Upvotes: 1

Views: 625

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61930

IIUC, you need to set utc=True:

i = pd.to_datetime(["2019-11-20 11:46:32+01:00", "2019-10-25T15:06:57.000+0200"], utc=True)
print(i)

Output

DatetimeIndex(['2019-11-20 10:46:32+00:00', '2019-10-25 13:06:57+00:00'], dtype='datetime64[ns, UTC]', freq=None)

From the documentation:

utc : boolean, default None

Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

Upvotes: 1

Related Questions