Ankit Bansal
Ankit Bansal

Reputation: 337

StratifiedKFold output handling

I have a series with 20 rows and 60 columns i.e 20 examples each with 60 parameters.

kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) The output consists of two columns

I would like to know what does the second column mean and on what basis does it choose the two indexes. Why not take three indexes?

Furthur, I would like to know how does the cross validation function take this series as an input for the "cv" argument. "cv" is generally an integer.

results = cross_val_score(estimator, X, encoded_Y, cv=kfold)

Upvotes: 2

Views: 638

Answers (1)

Ami Tavory
Ami Tavory

Reputation: 76297

As with all of the cross validators in sklearn.cross_validation this is an iterator over pairs of indices. In each pair, the first item is the list of train indices, and the second item is the list of test indices.

In the example you bring the first item contains a pair where everything except 1, 17 is the train indices, and 1, 17 are the test indices.

Upvotes: 0

Related Questions