Ivan
Ivan

Reputation: 20101

Time series cross-validation using linear regression from scikit learn

I'm using the Linear Regression model from Scikit Learn to an explanatory fit on a time series:

from sklearn import linear_model
import numpy as np

X = np.array([np.random.random(100), np.random.random(100)])
y = np.array(np.random.random(100))

regressor = linear_model.LinearRegression()
regressor.fit(X, y)
y_hat = regressor.predict(X)

I want do cross-validate the the prediction. As far as I know, I can't use the cross_val from sklearn (like Kfold) because it will break down the results randomly, and I need that the folds are sequentially. For example,

data_set = [1 2 3 4 5 6 7 8 9 10]

# first train set
train = [1]
# first test set
test = [2 3 4 5 6 7 8 9 10]
#fit, predict, evaluate

# train set
train = [1 2]
# test set
test = [3 4 5 6 7 8 9 10]
#fit, predict, evaluate

...

# train set
train = [1 2 3 4 5 6 7 8]
# test set
test = [9 10]
#fit, predict, evaluate

Is it possible to do this using sklearn?

Upvotes: 2

Views: 1395

Answers (1)

valentin
valentin

Reputation: 3608

You do not need scikit for this kind of folding. Slicing is sufficient, something like:

step = 1 
for i in range(0, len(data_set), step):
  train = dataset[:i]
  test = dataset[i:]
  # etc...

Upvotes: 1

Related Questions