Reputation: 6333
I want to see the individual score of each fitted model to visualize the strength of cross validation (I am doing this to show my coworkers why cross validation is important).
I have a .csv file with 500 rows, 200 independent variables and 1 binary target. I defined skf
to fold the data 5 times using StratifiedKFold
.
My code looks like this:
X = data.iloc[0:500, 2:202]
y = data["target"]
skf = StratifiedKFold(n_splits = 5, random_state = 0)
clf = svm.SVC(kernel = "linear")
Scores = [0] * 5
for i, j in skf.split(X, y):
X_train, y_train = X.iloc[i], y.iloc[i]
X_test, y_test = X.iloc[j], y.iloc[j]
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
As you can see, I assigned a list of 5 zeroes to Scores
. I would like to assign the clf.score(X_test, y_test)
of each of the 5 predictions to the list. However, the indices i
and j
are not {1, 2, 3, 4, 5}. Rather, they are row numbers used to fold the X
and y
data frames.
How can I assign the test scores of each of the k
fitted models into Scores
within this loop? Do I need a separate index for this?
I know using cross_val_score
literally does all this and gives you a geometric average of the k
scores. However, I want to show my coworkers what happens behind the cross validation functions that come in the sklearn
library.
Thanks in advance!
Upvotes: 1
Views: 536
Reputation: 231
If I understood the question, and you don't need any particular indexing for Scores:
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
X = np.random.normal(size = (500, 200))
y = np.random.randint(low = 0, high=2, size=500)
skf = StratifiedKFold(n_splits = 5, random_state = 0)
clf = SVC(kernel = "linear")
Scores = []
for i, j in skf.split(X, y):
X_train, y_train = X[i], y[i]
X_test, y_test = X[j], y[j]
clf.fit(X_train, y_train)
Scores.append(clf.score(X_test, y_test))
The result is:
>>>Scores
[0.5247524752475248, 0.53, 0.5, 0.51, 0.4444444444444444]
Upvotes: 1