Arturo Sbr
Arturo Sbr

Reputation: 6333

See the score of each fold when cross validating a model using a for loop

I want to see the individual score of each fitted model to visualize the strength of cross validation (I am doing this to show my coworkers why cross validation is important).

I have a .csv file with 500 rows, 200 independent variables and 1 binary target. I defined skf to fold the data 5 times using StratifiedKFold.

My code looks like this:

X = data.iloc[0:500, 2:202]
y = data["target"]
skf = StratifiedKFold(n_splits = 5, random_state = 0)
clf = svm.SVC(kernel = "linear")
Scores = [0] * 5
for i, j in skf.split(X, y):
    X_train, y_train = X.iloc[i], y.iloc[i]
    X_test, y_test = X.iloc[j], y.iloc[j]
    clf.fit(X_train, y_train)
    clf.score(X_test, y_test)

As you can see, I assigned a list of 5 zeroes to Scores. I would like to assign the clf.score(X_test, y_test) of each of the 5 predictions to the list. However, the indices i and j are not {1, 2, 3, 4, 5}. Rather, they are row numbers used to fold the X and y data frames.

How can I assign the test scores of each of the k fitted models into Scoreswithin this loop? Do I need a separate index for this?

I know using cross_val_score literally does all this and gives you a geometric average of the k scores. However, I want to show my coworkers what happens behind the cross validation functions that come in the sklearn library.

Thanks in advance!

Upvotes: 1

Views: 536

Answers (1)

Giacomo Sachs
Giacomo Sachs

Reputation: 231

If I understood the question, and you don't need any particular indexing for Scores:

from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC

X = np.random.normal(size = (500, 200))
y = np.random.randint(low = 0, high=2, size=500)
skf = StratifiedKFold(n_splits = 5, random_state = 0)
clf = SVC(kernel = "linear")
Scores = []
for i, j in skf.split(X, y):
    X_train, y_train = X[i], y[i]
    X_test, y_test = X[j], y[j]
    clf.fit(X_train, y_train)
    Scores.append(clf.score(X_test, y_test))

The result is:

>>>Scores
[0.5247524752475248, 0.53, 0.5, 0.51, 0.4444444444444444]

Upvotes: 1

Related Questions