Reputation: 462
these are a series of values taken each hour during 30 days, I gathered them in group of each hour as show 2 groups below:
{'date':
['2019-11-09','2019-11-10','2019-11-11','2019-11-12','2019-11-13','2019-11-14','2019-11-15','2019-11-16','2019-11-17','2019-11-18','2019-11-19','2019-11-20','2019-11-21','2019-11-22','2019-11-23','2019-11-24','2019-11-25','2019-11-26','2019-11-27','2019-11-28','2019-11-29','2019-11-30','2019-12-01','2019-12-02','2019-12-03','2019-12-04','2019-12-05','2019-12-06','2019-12-07','2019-12-08'],
'hora0':[111666.5,121672.91666666667,87669.33333333333,89035.58333333333,91707.91666666667,94449.33333333333,103476.91666666667,123271.5,133306.58333333334,103149.91666666667,106310.25,91830.25,77733.75,96823.25,102880.25,118383.33333333333,95076.66666666667,93561.83333333333,97651.58333333333,112180.0,118051.75,135456.0,149553.0,125797.25,126098.0,128603.75,84631.08333333333,85683.16666666667,96377.16666666667,113161.16666666667],
'hora2':[83768.83333333333,83319.58333333333,72922.75,71893.75,73933.0,76598.83333333333,81021.75,93588.83333333333,94514.08333333333,87147.66666666667,91464.08333333333,74022.41666666667,63709.166666666664,75939.33333333333,79904.16666666667,84435.33333333333,76736.0,85237.33333333333,79162.75,91729.58333333333,99081.58333333333,106440.41666666667,112064.66666666667,111635.58333333333,110168.58333333333,111241.25,62634.083333333336,68203.33333333333,71515.16666666667,80674.66666666667]}
Series has a similar distribution:
The AIC value is the Akaike information criterion, it compares the forecasting models to each other. The code for testing out different ARIMA models and calculating a range of ARIMA models to see which has the lowest AIC value
def AIC_iteration_i(train):
filterwarnings("ignore")
#X = df2.values
history = [x for x in train.iloc[:,0]]
p = d = q = range(0,6)
pdq = list(product(p,d,q))
aic_results = []
parameter = []
for param in pdq:
try:
model = ARIMA(history, order=param)
results = model.fit(disp=0)
# You can print each (p,d,q) parameters uncommented line below
#print('ARIMA{} - AIC:{}'.format(param, results.aic))
aic_results.append(results.aic)
parameter.append(param)
except:
continue
d = dict(ARIMA=parameter, AIC=aic_results)
results_table = pd.DataFrame(dict([ (k, pd.Series(v)) for k,v in d.items()]))
# AIC minimum value
order = results_table.loc[results_table['AIC'].idxmin()][0]
return order
it returns the same order (0, 2, 1)
for the (p,d,q)
parameters with the lowest AIC value for each series.
The prediction I got it with code below but result are no OK for hour 2
# time series hora0.iloc[:,0] and hora1.iloc[:,0] from pandas df
trained = list(hora0.iloc[:,0])
# order got it above (0,2,1)
orders = order
size = math.ceil(len(trained)*.8)
train, test = [trained[i] for i in range(size)] , [trained[i] for i in range(size,len(trained))]
predictions = []
predictionslower = []
predictionsupper = []
for k in range(len(test)):
model = ARIMA(trained, order=orders)
model_fit = model.fit(disp=0)
forecast, stderr, conf_int = model_fit.forecast()
yhat = forecast[0]
yhatlower = conf_int[0][0]
yhatupper = conf_int[0][1]
predictions.append(yhat)
predictionslower.append(yhatlower)
predictionsupper.append(yhatupper)
obs = test[k]
trained.append(obs)
#error = mean_squared_error(test, predictions)
predictions
prediction for
hour0 [113815.15072419723,128600.77967037176,131580.85654685542,83200.24743417211,83167.65192576911,95062.06180437957]`
prediction for `hour1 [79564.70753715932,112491.2694928094,114410.34654966182,60882.18766484651,nan,nan]
The AIC for series 2 also I check with pmd-arima
which order is same values for SARIMAX model. Please give me some light.
Upvotes: 0
Views: 525
Reputation: 462
The issue with the values in hour2 (also in other hours) for data is non stationary in time series, for removing non-stationary we can either apply a differentiation or natural logarithm to raw data:
hora2 = np.log('hora2')
{'date':['2019-11-09','2019-11-10','2019-11-11','2019-11-12','2019-11-13','2019-11-14','2019-11-15','2019-11-16','2019-11-17','2019-11-18','2019-11-19','2019-11-20','2019-11-21','2019-11-22','2019-11-23','2019-11-24','2019-11-25','2019-11-26','2019-11-27','2019-11-28','2019-11-29','2019-11-30','2019-12-01','2019-12-02','2019-12-03','2019-12-04','2019-12-05','2019-12-06','2019-12-07','2019-12-08'],
'hora2':[11.3358163,11.33043889,11.19715594,11.18294461,11.21091456,11.24633712,11.30247292,11.44666635,11.45650413,11.37535928,11.42370164,11.21212325,11.06208373,11.23769005,11.28858328,11.34374123,11.24812624,11.3531948,11.27926114,11.42660022,11.50369886,11.57534064,11.62683136,11.62299513,11.60976705,11.61945655,11.04506487,11.13024872,11.17766483,11.29817989]}
Once obtained the order for model ARIMA(trained, order=orders)
with minimized AIC value (Akaike Information Criterion) for each "horaX" series. Some series still return NaN
values in prediction and I had to take second or third minimized AIC value, the prediction result returned, applied exponential logarithm for recovery original values.
{'hora2':[11.6948938,12.00191037,11.81401922,11.77476296,11.83965601,11.89443423]}
hora2 = np.exp('hora2')
{'hora2':[119957.62142129,163066.00981609,135133.60347713,129931.53854787,138642.78415756,146449.24980086]}
the prediction result over test data is depict in picture:
Upvotes: 1