Alex Ivanov
Alex Ivanov

Reputation: 737

Kfold, cross_val_score: on the basis of what data the output is shown (sklearn wrapper)?

I can't understand the output of

kfold_results = cross_val_score(xg_cl, X_train, y_train, cv=kfold, scoring='roc_auc')

The output of xgb.cv is clear - there are the train and test scores:

[0] train-auc:0.927637+0.00405497   test-auc:0.788526+0.0152854
[1] train-auc:0.978419+0.0018253    test-auc:0.851634+0.0201297
[2] train-auc:0.985103+0.00191355   test-auc:0.86195+0.0164157
[3] train-auc:0.988391+0.000999448  test-auc:0.870363+0.0161025
[4] train-auc:0.991542+0.000756701  test-auc:0.881663+0.013579

But the result of cross_val_score in Sk-learn wrapper is umbiguous: it is a list of scores after each fold, but: -whether the result of test_data or of train_data?

Upvotes: 2

Views: 243

Answers (1)

Celius Stingher
Celius Stingher

Reputation: 18367

Kfold splits the data in the number of folds being passed, Changed in version 0.20: cv default value if None will change from 3-fold to 5-fold in v0.22. from sklearn. So what it does is split the dataset in 5 subsets (default for version 0.22), uses 4 as train, and 1 as validation. Therefore the output is an array of 5 items, 1 for each iteration. This is what it would look like: enter image description here

Upvotes: 1

Related Questions