Reputation: 3477
I have been reading about the technique of k-fold cross validation and I came through this example:
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_validation.cross_val_score(
... clf, iris.data, iris.target, cv=5)
...
>>> scores
array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
The mean score and the standard deviation of the score estimate are given by:
>>>
>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)
According to this source it says
When you perform k-fold CV, you get k different estimates of your model’s error- say e_1, e_2, e_3, ..., e_k. Since each e_i is an error estimate, it should ideally be zero.
To check out you model’s bias, find out the mean of all the e_i's. If this value is low, it basically means that your model gives low error on an average– indirectly ensuring that your model’s notions about the data are accurate enough.
According to the example of the SVM with the iris dataset, it gives a mean of 0.98, so does this mean that our model is not flexible enough?
Upvotes: 2
Views: 1515
Reputation: 1150
accuracy
, so higher values are better for you.Accuracy: 0.98 (+/- 0.03)
Your results show that you have 95% confidence that the mean accuracy will be between 0.95 and 1.
Upvotes: 1
Reputation: 2226
So I think your question is a misunderstanding of what k-fold is for. Thought I would explain a couple things about it.
It's used in machine learning for when you have a smaller sample size and you need to be able to test how accurate it is. K-fold splits your data into k different tests. So say it was 5, its 20% for testing, 80% for training, and which 20% is tested for is switched each test, same with which 80% is trained for. This is useful when you are worried about a bias because of small amounts of data.
The accuracy you get back is how accurate on average across the k tests it was able to identify what you were looking for, in this case which iris was correctly identified.
0.98% is quite a decent number so your model is fine. Thats an error rate of 0.02
which is close to the 0 of the goal, as it is unlikely to ever hit 0.
Upvotes: 0