Snoop2001
Snoop2001

Reputation: 25

Questionable output from LeaveOneOut

My dataset has 93 observations and 24 features. I am using a SVM model to classify into either class 0 or class 1.

I have some questions about the Leave One Out Cross Validation approach I used, specifically regarding the accuracy, precision, recall, and AUC

I have tested the methods in my code below, but something is definitely wrong, which you can see from the accuracy standard deviation of 0.91.

What did I miss?

Let me know if you need more information. Thanks!

#creates feature set and class#

x = np.array(df.drop(['target'], 1))
y = np.array(df['target'])
xs = scale(x)



#Here is the LOOCV code to achieve accuracy#

svm_model = SVC(C=0.1,kernel ='linear', probability = True)   
loo = LeaveOneOut(93)

acc = cross_val_score(estimator=svm_model,
                                      X=xs,
                                      y=y,
                                      cv=loo)
print(acc)
print("Accuracy: %0.2f (+/- %0.2f)" % (acc.mean(), acc.std() * 2))
#prints 0.71 +- 0.91 
    [0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 
    1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0]



#Here is what I tried to get precision and recall#

predicted = cross_val_predict(svm_model, xs, y, cv = loo)
print (recall_score(y, predicted))
#prints 23%

print (precision_score(y, predicted))
#prints 46%


print (roc_auc_score(y, predicted))
#prints 56%

Upvotes: 0

Views: 121

Answers (1)

James Dellinger
James Dellinger

Reputation: 1261

According to the SkLearn documentation for LeaveOneOut, it appears that the split() method is actually responsible for generating train/test indices for all the CV splits:

loo = LeaveOneOut()
loo.split(xs, y)

I believe that the above two lines should replace the line, loo = LeaveOneOut(93), that you had written. If you look at the source code for the __init__() method used by LeaveOneOut, you can see that nothing is done with any arguments that may be passed to it. I believe that this is why you saw no error message when you created your loo object by passing to it the integer 93.

Indeed, if you scroll to just right underneath the source code for the __init__() method, you'll see that the split method actually accepts arguments (the training data, and labels) and then yields the train/test indices for each CV fold (93 folds, in your case).

Upvotes: 1

Related Questions