Normalize Time Series - Scikit

Question

I have:

3 wikipedia article access counts (weekly) (A-B-C)
Ground truth data (weekly)
Total wikipedia english article traffic counts (weekly)

My purpose is, build a multiple linear regression with 3 wikipedia article access counts and try to predict future ground truth data.

Before start to build multiple linear regression, I want to make some pre processing( normalization or scaling ) on my 3 wikipedia access count data.

My data format is like this.

    date     | A (x1)     | B (x2)  |  C (x3) | total_en     | ground truth(y)

 01/01/2008  |   5611     |   606   |    376  |  1467923911  | 3.13599886
 08/01/2008  |   8147     |   912   |    569  |  1627405409  | 2.53335614
 15/01/2008  |   9809     |   873   |    597  |  1744099880  | 2.91287713
 22/01/2008  |   12020    |   882   |    600  |  1804646235  | 3.44497102  
 ...         |    ...     |   ...   |    ...  |    ...       | ...

Without normalization I build my multiple linear regression like this.

wiki3.shape = (150,3) // include A-B-C article with numpy array

ground_truth = (150,1) // include ground truth data in numpy array

X_train, X_test, y_train, y_test = cross_validation.train_test_split(wiki3, ground_truth, test_size=0.3, random_state=1)

model = linear_model.LinearRegression()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

My question is for better results How can I normalize/scale my x1,x2,x3 and y data ?

Should I normalize each article with the total english article traffic or should I use another way ?

Is K-Fold cross validation sensible for time-series ?

Thanks.

Normalize Time Series - Scikit

Answers (1)

Related Questions