Oumayma Riahi
Oumayma Riahi

Reputation: 31

Forecasting sales with a 6 years dataset-python

I am trying to forecast demand based on a 6 years dataset 1/1/2014==> 1/1/2020. first I tried to regroup demand by month and so I ended up with a dataset of 2 columns ( month and sales) and 72rows ( 12month*6years). P.s: I am working with python.

My first question is: is it enough to get predictions of the next year( 2020), knowing the fact that i only have 72 rows.

My second question is, are there any models you can advise me to work with and that would give me a good accuracy?

I have tried working with arima model combined with seasonnality ( sarimax) and LSTM tho it didnt work, I am not sure if i am doing it right.

My third question is : Are there any test in python that tell you if there is seasonnality or not?

enter image description here

#shrink the dataset
dataa=data[(data['Produit']=='ACP NOR/STD')&(data['Région']=='Europe')]

gb2=dataa.groupby(by=[dataa['Mois'].dt.strftime('%Y, %m')])['Chargé (T)'].sum().reset_index()
gb2.Mois=pd.to_datetime(gb2.Mois)

[#create a time serie][2]
series = pd.Series(gb2['Chargé (T)'].values, index=gb2.Mois)


#decompose the dataset to 3 things: trend, seasonality and noise
from pylab import rcParams
import statsmodels.api as sm
rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(series, model='additive')
fig = decomposition.plot()
plt.show()


    #calculate acf and pacf to know in which order to stop

    from statsmodels.graphics.tsaplots import plot_acf
    from statsmodels.graphics.tsaplots import plot_pacf
    from matplotlib import pyplot

    pyplot.figure()
    pyplot.subplot(211)
    plot_acf(series, ax=pyplot.gca())
    pyplot.subplot(212)
    plot_pacf(series, ax=pyplot.gca())
    pyplot.show()

import itertools
p = d = q = range(0, 5)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))


    import warnings
    warnings.filterwarnings("ignore")
    for param in pdq:
        for param_seasonal in seasonal_pdq:
            try:
                mod = sm.tsa.statespace.SARIMAX(series,
                                                order=param,
                                                seasonal_order=param_seasonal,
                                                enforce_stationarity=False,
                                                enforce_invertibility=False)

                results = mod.fit()

                print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
            except:
                continue

mod = sm.tsa.statespace.SARIMAX(series,
                                order=(0, 1, 2),
                                seasonal_order=(0, 4, 0, 12),
                                enforce_stationarity=False,
                                enforce_invertibility=False)

    results = mod.fit()

    print(results.summary().tables[1])
    results.plot_diagnostics(figsize=(16, 8))
    plt.show()
    #get predictions
    pred = results.get_prediction(start=pd.to_datetime('2019-01-01'), dynamic=False)
    pred_ci = pred.conf_int()

    ax = series['2014':].plot(label='observed')
    pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.8, figsize=(14, 7))

    ax.fill_between(pred_ci.index,
                    pred_ci.iloc[:, 0],
                    pred_ci.iloc[:, 1], color='k', alpha=.2)

    ax.set_xlabel('Date')
    ax.set_ylabel('Chargé (T)')
    plt.legend()

    plt.show()

The predictions have nothing to do with reality... I would really appreciate anyone s help.

Upvotes: 1

Views: 736

Answers (2)

Petro Franchuk
Petro Franchuk

Reputation: 51

  1. As I know, we can produce meaningful prediction using such amount of data (it means that for each month you are using 6 data points to fit model), but try to use as much data as you can - then your accuracy will only increase.
  2. Almost always there is some seasonality in time series, even more, there is also a trend. So you need to decompose your original time series to trend, season and residuals, and all prediction will be done with residuals. Regarding model - ARIMA is enough for prediction time series, to make it more precise just tune your parameters (p and q) using PACF and ACF.
  3. We do decomposing to make our time series stationary, in other words
    • to extract residuals from it (we should train our model only on stationary data). You rather can check stationarity, not seasonality
    • there is ADF test for it.

I`ve done a lot of research on it and had one project on ts predicting, here is example, where are described all steps :

Upvotes: 1

Akhil
Akhil

Reputation: 31

Answer to your First Question: Data you have collected looks small and it would be great if you can collect day wise so that your model can do great. Since, Recurrent Neural Nets perform well with data elements collected with less time difference I suggest you to collect data day wise that can take you to (12 x 30 x 6) It can become the best feed in to any model.

Answer to Second Question: I personally suggest you to make a try with LSTM's with more data an valuable parameters and a good collection is given in this Medium Post.Medium Post

Performance varies with variation in parameters so be cautious in selecting parameters that are being fed in.

Answer to Third Question: Seasonality is generally detected using the technique called "Anomaly Detection". A small discussion is made on that too in the medium post given above.

Upvotes: 0

Related Questions