Lenka Vraná
Lenka Vraná

Reputation: 1706

How to get predictions using X-13-ARIMA in python statsmodels

I'm trying to run X-13-ARIMA model from statsmodels library in python 3.

I found this example in statsmodels documentation:

dta = sm.datasets.co2.load_pandas().data
dta.co2.interpolate(inplace=True)
dta = dta.resample('M').sum()

res = sm.tsa.x13_arima_select_order(dta.co2)
print(res.order, res.sorder)

results = sm.tsa.x13_arima_analysis(dta.co2)

fig = results.plot()
fig.set_size_inches(12, 5)
fig.tight_layout()

This works fine, but I also need to predict future values of this time series. The tsa.x13_arima_analysis() function contains forecast_years parameter, so I suppose it should be possible. However; the results object doesn't seem to change no matter what value of forecast_years parameter I choose.

How can I get the forecast values?

Upvotes: 2

Views: 10137

Answers (3)

kevin_theinfinityfund
kevin_theinfinityfund

Reputation: 2187

Late response but hopefully helpful.

result = x13_arima_analysis(df[dependent_var],
                            tempdir=output_directory,
                            forecast_periods=12)

Then to find the forecast table inside of the result.results output you must search by the id of the table name being id=fct.

from bs4 import BeautifulSoup
import pandas as pd

result_string = result.results 
soup = BeautifulSoup(result_string, 'lxml') 
specific_section = soup.find('div', id='fct') 
table = specific_section.find('table') if specific_section else None

This will give you the "string" version of the table. Then you can parse however you would like. I needed a dataframe (example below).

<table class="w70" summary="Confidence intervals with coverage probability 

( 0.95000)">
<caption><strong>Confidence intervals with coverage probability ( 0.95000) <br/> On the Original Scale</strong></caption>
<tr>
<th scope="col">Date</th>
<th scope="col">Lower</th>
<th scope="col">Forecast</th>
<th scope="col">Upper</th>
</tr>
<tr>
<th scope="row">2023.Aug</th>
<td>  3560.68    </td>
<td>  3694.21    </td>
<td>  3832.74    </td>
</tr>
<tr>
<th scope="row">2023.Sep</th>
<td>  3393.61    </td>
<td>  3579.02    </td>
<td>  3774.55    </td>
</tr>
<tr>
<th scope="row">2023.Oct</th>
<td>  3275.37    </td>
<td>  3491.64    </td>
<td>  3722.18    </td>

Example:

if table:
    dates, lowers, forecasts, uppers = [], [], [], []
    for row in table.find_all('tr')[1:]:  # skip the header row
        columns = row.find_all('td')
        date = row.find('th', scope='row').text
        lower, forecast, upper = [col.text.strip() for col in columns]
        
        dates.append(date.replace('.', '_'))
        lowers.append(float(lower))
        forecasts.append(float(forecast))
        uppers.append(float(upper))

    # Create a DataFrame
    df = pd.DataFrame({
        'Date': dates,
        'Lower': lowers,
        'Forecast': forecasts,
        'Upper': uppers
    })

>>
    Date    Lower  Forecast    Upper
0   2023_Aug  3560.68   3694.21  3832.74
1   2023_Sep  3393.61   3579.02  3774.55
2   2023_Oct  3275.37   3491.64  3722.18
3   2023_Nov  3070.97   3318.68  3586.36
4   2023_Dec  2895.54   3162.95  3455.05
5   2024_Jan  2884.05   3179.71  3505.69
6   2024_Feb  2901.60   3228.93  3593.18
7   2024_Mar  2979.65   3343.18  3751.07
8   2024_Apr  3008.83   3401.86  3846.23
9   2024_May  3068.21   3494.67  3980.41
10  2024_Jun  3089.83   3543.59  4064.00
11  2024_Jul  3066.04   3539.45  4085.96

Upvotes: 0

fane96
fane96

Reputation: 21

forecast_years=x worked for me. Pay attention to the version of statsmodels you are running ("pip freeze | grep statsmodels") as for version 10.2 the correct parameter for forecasting horizon is <forecast_years> but in version 11.0 and higher the correct parameter is <forecast_periods>.

A simple regex should do the trick to find your forecast values:

202\d.\w{3}\s{6}\d\d.\d\d\s{5}\d\d.\d\d\s{5}\d\d.\d\d (run on each line of your results)

which would match:

2020.Feb      18.04     32.25     46.47

Upvotes: 1

Bill Bell
Bill Bell

Reputation: 21663

By now you probably have this yourself. I retrieved some monthly weather data that ends in July of 2012. I entered this statement to do the analysis.

results = sm.tsa.x13_arima_analysis(s, forecast_years=3)

Then (having found that results.results is voluminous) I entered this.

open('c:/scratch/result.txt', 'w').write(results.results)

Peering through this file for 'forecast' I found the following section.

 FORECASTING
  Origin  2012.Jul
  Number         3

  Forecasts and Standard Errors of the Prior Adjusted Data
   ------------------------------
                         Standard
       Date   Forecast      Error
   ------------------------------
   2012.Aug      33.02      2.954
   2012.Sep      28.31      2.954
   2012.Oct      21.54      2.954
   ------------------------------

  Confidence intervals with coverage probability ( 0.95000
   ---------------------------------------
       Date      Lower  Forecast     Upper
   ---------------------------------------
   2012.Aug      27.23     33.02     38.82
   2012.Sep      22.52     28.31     34.10
   2012.Oct      15.75     21.54     27.33
   ---------------------------------------

forecast_years=3 seems to be taken to mean make a forecast of three months, in this case starting after July.

Upvotes: 2

Related Questions