Reputation: 167
I have a following code using linear_model.Lasso
:
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,y,test_size=0.2)
clf = linear_model.Lasso()
clf.fit(X_train,y_train)
accuracy = clf.score(X_test,y_test)
print(accuracy)
I want to perform k fold (10 times to be specific) cross_validation. What would be the right code to do that?
Upvotes: 4
Views: 12272
Reputation: 6376
here is the code I use to perform cross validation on a linear regression model and also to get the details:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf, X_Train, Y_Train, scoring="neg_mean_squared_error", cv=10)
rmse_scores = np.sqrt(-scores)
As said in this book at page 108 this is the reason why we use -score:
Scikit-Learn cross-validation features expect a utility function (greater is better) rather than a cost function (lower is better), so the scoring function is actually the opposite of the MSE (i.e., a negative value), which is why the preceding code computes -scores before calculating the square root.
and to visualize the result use this simple function:
def display_scores(scores):
print("Scores:", scores)
print("Mean:", scores.mean())
print("Standard deviation:", scores.std())
Upvotes: 5
Reputation: 23770
You can run 10-fold using the model_selection
module:
# for 0.18 version or newer, use:
from sklearn.model_selection import cross_val_score
# for pre-0.18 versions of scikit, use:
from sklearn.cross_validation import cross_val_score
X = # Some features
y = # Some classes
clf = linear_model.Lasso()
scores = cross_val_score(clf, X, y, cv=10)
This code will return 10 different scores. You can easily get the mean:
scores.mean()
Upvotes: 2