Reputation: 51
I am learning Machine learning and I am having this doubt. Can anyone tell me what is the difference between:-
from sklearn.model_selection import cross_val_score
and
from sklearn.model_selection import KFold
I think both are used for k fold cross validation, but I am not sure why to use two different code for same function. If there is something I am missing please do let me know. ( If possible please explain difference between these two methods)
Thanks,
Upvotes: 4
Views: 6436
Reputation: 1
cross_val_score evaluates the score using cross validation by randomly splitting the training sets into distinct subsets called folds, then it trains and evaluated the model on the folds, picking a different fold for evaluation every time and training on the other folds.
cv_score = cross_val_score(model, data, target, scoring, cv)
KFold procedure divides a limited dataset into k non-overlapping folds. Each of the k folds is given an opportunity to be used as a held-back test set, whilst all other folds collectively are used as a training dataset. A total of k models are fit and evaluated on the k hold-out test sets and the mean performance is reported.
cv = KFold(n_splits=10, random_state=1, shuffle=True)
cv_score = cross_val_score(model, data, target, scoring, cv=cv)
where model is your model on which you want to evaluate, data is training data, target is target variable, scoring parameter controls what metric applied to the estimator applied and cv is the number of splits.
Upvotes: 0
Reputation: 4960
cross_val_score
is a function which evaluates a data and returns the score.
On the other hand, KFold
is a class, which lets you to split your data to K
folds.
So, these are completely different. Yo can make K fold of data and use it on cross validation like this:
# create a splitter object
kfold = KFold(n_splits = 10)
# define your model (any model)
model = XGBRegressor(**params)
# pass your model and KFold object to cross_val_score
# to fit and get the mse of each fold of data
cv_score = cross_val_score(model,
X, y,
cv=kfold,
scoring='neg_root_mean_squared_error')
print(cv_score.mean(), cv_score.std())
Upvotes: 3