Reputation: 693
I have pandas DataFrames with time-series that cover middle european daylight saving time disturbances each spring and autum for a few years.
For some dfs, everything works as expected using dt.tz_localize
but I have troubles solving the following time notation:
not_working = pd.to_datetime(
pd.Series(
[
"2017-10-29 01:45:00",
"2017-10-29 02:00:00",
"2017-10-29 02:15:00",
"2017-10-29 02:45:00",
"2017-10-29 03:00:00",
"2017-10-29 02:15:00",
"2017-10-29 02:45:00",
"2017-10-29 03:00:00",
]
)
)
not_working = not_working.dt.tz_localize(tz="Europe/Berlin", ambiguous="infer")
which results in this error message:
pytz.exceptions.AmbiguousTimeError: 2017-10-29 02:00:00
When localizing 02:00 twice instead, it works as expected. Sadly, I can't control the input data format. Thus, I am wondering how to localize my data above.
working = pd.to_datetime(
pd.Series(
[
"2017-10-29 01:45:00",
"2017-10-29 02:00:00",
"2017-10-29 02:15:00",
"2017-10-29 02:45:00",
"2017-10-29 02:00:00",
"2017-10-29 02:15:00",
"2017-10-29 02:45:00",
"2017-10-29 03:00:00",
]
)
)
working = working.dt.tz_localize(tz="Europe/Berlin", ambiguous="infer")
This fix works, but I'd prefer a less hacky solution. The nonexisting
parameter does not work together with ambigous
as I don't want to build a boolean array first.
not_working = not_working - pd.Timedelta(seconds=1)
not_working = not_working.dt.tz_localize(tz="Europe/Berlin", ambiguous="infer")
not_working = not_working + pd.Timedelta(seconds=1)
Upvotes: 4
Views: 2012
Reputation: 29407
The issue is that you're starting with datetime data that's not time-zone aware and so tz_localize
is not able to resolve ambiguities for some special dates/times related to daylight savings.
If you make your dataframe data time-zone aware to begin with by using the option utc=True
, you can convert it to the desired time zone:
working = pd.to_datetime(
pd.Series(
[
"2017-10-29 01:45:00",
"2017-10-29 02:00:00",
"2017-10-29 02:15:00",
"2017-10-29 02:45:00",
"2017-10-29 03:00:00",
"2017-10-29 02:15:00",
"2017-10-29 02:45:00",
"2017-10-29 03:00:00",
]
),
utc=True
)
working.dtypes
# datetime64[ns, UTC]
working.dt.tz_convert(tz="Europe/Berlin")
# 0 2017-10-29 02:45:00+01:00
# 1 2017-10-29 03:00:00+01:00
# 2 2017-10-29 03:15:00+01:00
# 3 2017-10-29 03:45:00+01:00
# 4 2017-10-29 04:00:00+01:00
# 5 2017-10-29 03:15:00+01:00
# 6 2017-10-29 03:45:00+01:00
# 7 2017-10-29 04:00:00+01:00
If the original data are not in the UTC time zone you can adjust them before converting adding or subtracting the appropriate time interval
working = working-pd.Timedelta('01:00:00')
working.dt.tz_convert(tz="Europe/Berlin")
# 0 2017-10-29 02:45:00+02:00
# 1 2017-10-29 02:00:00+01:00
# 2 2017-10-29 02:15:00+01:00
# 3 2017-10-29 02:45:00+01:00
# 4 2017-10-29 03:00:00+01:00
# 5 2017-10-29 02:15:00+01:00
# 6 2017-10-29 02:45:00+01:00
# 7 2017-10-29 03:00:00+01:00
# dtype: datetime64[ns, Europe/Berlin]
Upvotes: 1