ivegotaquestion
ivegotaquestion

Reputation: 693

localize pandas datetime during daylight saving with multiple notations

I have pandas DataFrames with time-series that cover middle european daylight saving time disturbances each spring and autum for a few years. For some dfs, everything works as expected using dt.tz_localize but I have troubles solving the following time notation:

not_working = pd.to_datetime(
    pd.Series(
        [
            "2017-10-29 01:45:00",
            "2017-10-29 02:00:00",
            "2017-10-29 02:15:00",
            "2017-10-29 02:45:00",
            "2017-10-29 03:00:00",
            "2017-10-29 02:15:00",
            "2017-10-29 02:45:00",
            "2017-10-29 03:00:00",
        ]
    )
)
not_working = not_working.dt.tz_localize(tz="Europe/Berlin", ambiguous="infer")

which results in this error message:

pytz.exceptions.AmbiguousTimeError: 2017-10-29 02:00:00

When localizing 02:00 twice instead, it works as expected. Sadly, I can't control the input data format. Thus, I am wondering how to localize my data above.

working = pd.to_datetime(
pd.Series(
    [
        "2017-10-29 01:45:00",
        "2017-10-29 02:00:00",
        "2017-10-29 02:15:00",
        "2017-10-29 02:45:00",
        "2017-10-29 02:00:00",
        "2017-10-29 02:15:00",
        "2017-10-29 02:45:00",
        "2017-10-29 03:00:00",
        ]
    )
)

working = working.dt.tz_localize(tz="Europe/Berlin", ambiguous="infer")

This fix works, but I'd prefer a less hacky solution. The nonexisting parameter does not work together with ambigous as I don't want to build a boolean array first.

not_working = not_working - pd.Timedelta(seconds=1)
not_working = not_working.dt.tz_localize(tz="Europe/Berlin", ambiguous="infer")
not_working = not_working + pd.Timedelta(seconds=1)

Upvotes: 4

Views: 2012

Answers (1)

user2314737
user2314737

Reputation: 29407

The issue is that you're starting with datetime data that's not time-zone aware and so tz_localize is not able to resolve ambiguities for some special dates/times related to daylight savings.

If you make your dataframe data time-zone aware to begin with by using the option utc=True, you can convert it to the desired time zone:

working = pd.to_datetime(
    pd.Series(
        [
            "2017-10-29 01:45:00",
            "2017-10-29 02:00:00",
            "2017-10-29 02:15:00",
            "2017-10-29 02:45:00",
            "2017-10-29 03:00:00",
            "2017-10-29 02:15:00",
            "2017-10-29 02:45:00",
            "2017-10-29 03:00:00",
        ]
    ),
    utc=True
)

working.dtypes
# datetime64[ns, UTC]

working.dt.tz_convert(tz="Europe/Berlin")
# 0   2017-10-29 02:45:00+01:00
# 1   2017-10-29 03:00:00+01:00
# 2   2017-10-29 03:15:00+01:00
# 3   2017-10-29 03:45:00+01:00
# 4   2017-10-29 04:00:00+01:00
# 5   2017-10-29 03:15:00+01:00
# 6   2017-10-29 03:45:00+01:00
# 7   2017-10-29 04:00:00+01:00

If the original data are not in the UTC time zone you can adjust them before converting adding or subtracting the appropriate time interval

working = working-pd.Timedelta('01:00:00')

working.dt.tz_convert(tz="Europe/Berlin")
# 0   2017-10-29 02:45:00+02:00
# 1   2017-10-29 02:00:00+01:00
# 2   2017-10-29 02:15:00+01:00
# 3   2017-10-29 02:45:00+01:00
# 4   2017-10-29 03:00:00+01:00
# 5   2017-10-29 02:15:00+01:00
# 6   2017-10-29 02:45:00+01:00
# 7   2017-10-29 03:00:00+01:00
# dtype: datetime64[ns, Europe/Berlin]

Upvotes: 1

Related Questions