Reputation: 2250
I am working on a time series prediction for the first time and little confused about how to create the target variable. The data looks like:
I am trying to predict the percentage change in sales for 1st-quarter for customer A in 2019. One way I thought of deriving the target is rolling average of the past 3 months and shift of 1. After manipulation, it looks like:
But I am confused should I take an average of Jan, Feb, March for the target in April or average of Feb, March, April for the target in Jan?
Upvotes: 0
Views: 899
Reputation: 5015
The time series prediction is based on the principle of autocorrelation, like y from Xn to Xn+100
and Xn+time_lag to Xn+100+time_lag
You will notice that the bigger the time lag, the smaller in the autocorrelation and the worse will be the predictive power of your model:
If you create a rolling mean, you will lose information, creating a fuzzy target. I would use target
itself for better predictions.
What I mean is that you use same variable target
as x_train
and y_train
, creating a time lag between them.
Then you can use ARIMA, LSTM Neural Networks, Linear Regression, Neural Networks, Temporal Convolutional Networks to map from input to target.
Check the level of autocorrelation of your data:
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(dataframe['target'])
Upvotes: 1