Reputation: 313
I am using cross_val_score
to compute the mean score for a regressor. Here's a small snippet.
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
cross_val_score(LinearRegression(), X, y_reg, cv = 5)
Using this I get an array of scores. I would like to know how the scores on the validation set (as returned in the array above) differ from those on the training set, to understand whether my model is over-fitting or under-fitting.
Is there a way of doing this with the cross_val_score
object?
Upvotes: 7
Views: 4802
Reputation: 798
You can use cross_validate
instead of cross_val_score
according to doc:
The
cross_validate
function differs fromcross_val_score
in two ways -
- It allows specifying multiple metrics for evaluation.
- It returns a dict containing training scores, fit-times and score-times in addition to the test score.
Upvotes: 21
Reputation: 1998
Why would you want that? cross_val_score(cv=5)
does that for you as it splits your train data 10 times and verifies accuracy scores on 5 test subsets. This method already serves as a way to prevent your model from over-fitting.
Anyway, if you are eager to verify accuracy on your validation data, then you have to fit your LinearRegression first on X and y_reg.
Upvotes: -5