How does sklearn cross_val_score use kfold?

Question

I am new to machine learning and am trying to understand cross_val_score uses Kfold to split the data to k folds.

kf = KFold(n_splits=2)
cv_results =cross_val_score(model, X_train, Y_train, cv=kf)

I know kfold splits the data but I tried printing it out

dataset = [[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[6,6,6],[7,7,7],[8,8,8]]
kf =  KFold(n_splits=2)
print kf

>>> KFold(n_splits=2, random_state=None, shuffle=False)

It doesn't show the k folds but then how does cross_val_score get all the folds?

DataCruncher · Accepted Answer

You need to call Kf.split(dataset) to actually split the data. Click here to see how KFold works

Just to make it clear, KFold is a class and not a function.

kf = KFold(n_splits=2) creates an object of KFold. and print kf will just print out the class object.

and when you callcross_val_score(model, X_train, Y_train, cv=kf) you are passing the object kf to cross_val_score function where kf.split(X_train) would be called to split X_train into 2 folds. Y_train would also be splitted similarly.

How does sklearn cross_val_score use kfold?

Answers (2)

Related Questions