Eric O
Eric O

Reputation: 181

tz_localize AmbiguousTimeError: Cannot infer dst time with non DST dates

I'm trying to import some timeseries data and convert it into UTC so I can merge it with another dataset. This data seems to have 24 hour data and doesn't have DST adjustments. Python pandas tz_localize throws NonExistentTimeError, then unable to drop erroneous times gives a similar answer, but they simply drop the line. I need to shift it so I can merge it with my other data.

When I run my code:

df = pd.read_csv('http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/data/hourly/{}/{}_{}_solar.csv'.format(723898,723898,1998), usecols=["YYYY-MM-DD", "HH:MM (LST)","Meas Glo (Wh/m^2)","Meas Dir (Wh/m^2)","Meas Dif (Wh/m^2)"])

def clean_time(obj):
    hour = int(obj[0:-3])
    hour = str(hour - 1)
    if len(str(hour)) == 2:
        return hour+":00"
    else:
        return "0" + hour + ":00"

df['HH:MM (LST)'] = df['HH:MM (LST)'].apply(clean_time)
df['DateTime'] = df['YYYY-MM-DD'] + " " + df['HH:MM (LST)']
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df.drop(["YYYY-MM-DD", "HH:MM (LST)",'DateTime'],axis=1,inplace=True)
df.index = df.index.tz_localize('US/Pacific', ambiguous='infer')

I get:

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1998-10-25 01:00:00 as there are no repeated times

If I leave ambiguous='raise' (the default), it gives me:

pytz.exceptions.NonExistentTimeError: 1998-04-05 02:00:00

So I'm stuck on either the start, or end of daylight savings time.

There's quite a few of these datasets (multiple sites over multiple years) I need to merge, so I'd prefer not to hand code specific hours to shift.

Upvotes: 11

Views: 9630

Answers (1)

Freek Wiekmeijer
Freek Wiekmeijer

Reputation: 4950

Minimal reproduction scenario:

from datetime import datetime, timedelta

import pandas as pd


df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
df.index.tz_localize('Europe/Amsterdam', ambiguous='infer')

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 2019-10-27 02:00:00 as there are no repeated times

Solution: manually specify which datetime objects must be considered DT (Daylight Time) or DST (Daylight Savings Time). See documentation.

from datetime import datetime, timedelta
import numpy as np
import pandas as pd


df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
infer_dst = np.array([False] * df.shape[0])  # all False -> every row considered DT, alternative is True to indicate DST. The array must correspond to the iloc of df.index
df.index.tz_localize('Europe/Amsterdam', ambiguous=infer_dst)  # no error

Upvotes: 12

Related Questions