Mahran
Mahran

Reputation: 157

a KeyError when trying to forecast using ExponentialSmoothing

I'm trying to forecast some data about my city in terms of population. I have a table showing the population of my city from 1950 till 2021. Using pandas and ExpotentialSmoothing, I'm trying to forecast and see the next 10 years how much my city will have population. I'm stuck here:

train_data = df.iloc[:60]
test_data = df.iloc[59:]

fitted = ExponentialSmoothing(train_data["Population"],
                         trend = "add",
                         seasonal = "add",
                         seasonal_periods=12).fit()

fitted.forecast(10)

However, I get this message:

'The start argument could not be matched to a location related to the index of the data.'

Update: here are some codes from my work:

Jeddah_tb = pd.read_html("https://www.macrotrends.net/cities/22421/jiddah/population", match ="Jiddah - Historical Population Data", parse_dates=True)

df['Year'] = pd.to_datetime(df['Year'], format="%Y")
df.set_index("Year", inplace=True)

and here is the index:

DatetimeIndex(['2021-01-01', '2020-01-01', '2019-01-01', '2018-01-01',
           '2017-01-01', '2016-01-01', '2015-01-01', '2014-01-01',
           '2013-01-01', '2012-01-01', '2011-01-01', '2010-01-01',
           '2009-01-01', '2008-01-01', '2007-01-01', '2006-01-01',
           '2005-01-01', '2004-01-01', '2003-01-01', '2002-01-01',
           '2001-01-01', '2000-01-01', '1999-01-01', '1998-01-01',
           '1997-01-01', '1996-01-01', '1995-01-01', '1994-01-01',
           '1993-01-01', '1992-01-01', '1991-01-01', '1990-01-01',
           '1989-01-01', '1988-01-01', '1987-01-01', '1986-01-01',
           '1985-01-01', '1984-01-01', '1983-01-01', '1982-01-01',
           '1981-01-01', '1980-01-01', '1979-01-01', '1978-01-01',
           '1977-01-01', '1976-01-01', '1975-01-01', '1974-01-01',
           '1973-01-01', '1972-01-01', '1971-01-01', '1970-01-01',
           '1969-01-01', '1968-01-01', '1967-01-01', '1966-01-01',
           '1965-01-01', '1964-01-01', '1963-01-01', '1962-01-01',
           '1961-01-01', '1960-01-01', '1959-01-01', '1958-01-01',
           '1957-01-01', '1956-01-01', '1955-01-01', '1954-01-01',
           '1953-01-01', '1952-01-01', '1951-01-01', '1950-01-01'],
          dtype='datetime64[ns]', name='Year', freq='-1AS-JAN')

Upvotes: 0

Views: 970

Answers (1)

MonkeyDLuffy
MonkeyDLuffy

Reputation: 558

I didn't face any issue while trying to reproduce your code. However, before for time series forecasting make sure your data is in ascending order of dates. df = df.sort_values(by='Year',ascending = True). In your case, train_data is from 2021 to 1962 and test_data is from 1962-1950. So you are training on recent data but testing it on past. So sort your dataframe in ascending order. Also make test_data = df.iloc[60:] because 1962 is present in both train_data and test_data.

Upvotes: 2

Related Questions