bazinga
bazinga

Reputation: 2250

Target variable for time series analysis

I am working on a time series prediction for the first time and little confused about how to create the target variable. The data looks like:

Sample Dataframe

I am trying to predict the percentage change in sales for 1st-quarter for customer A in 2019. One way I thought of deriving the target is rolling average of the past 3 months and shift of 1. After manipulation, it looks like:

Rolled Target

But I am confused should I take an average of Jan, Feb, March for the target in April or average of Feb, March, April for the target in Jan?

Upvotes: 0

Views: 899

Answers (1)

razimbres
razimbres

Reputation: 5015

The time series prediction is based on the principle of autocorrelation, like y from Xn to Xn+100 and Xn+time_lag to Xn+100+time_lag

You will notice that the bigger the time lag, the smaller in the autocorrelation and the worse will be the predictive power of your model:

Autocorrelation

If you create a rolling mean, you will lose information, creating a fuzzy target. I would use target itself for better predictions.

What I mean is that you use same variable target as x_train and y_train, creating a time lag between them.

Then you can use ARIMA, LSTM Neural Networks, Linear Regression, Neural Networks, Temporal Convolutional Networks to map from input to target.

Check the level of autocorrelation of your data:

from pandas.plotting import autocorrelation_plot
autocorrelation_plot(dataframe['target'])

Upvotes: 1

Related Questions