Reputation: 1246
I have a dataset that is divided into training and test parts. My task is to train it and evaluate my model using k-fold cross validation. I'm a bit confused with the task statement. As far as I know the point of k-dold cross validation is to evaluate the model on a limited data sample by using all the data to train and test it. Please tell me if I'm correct in my algorithm:
Upvotes: -1
Views: 899
Reputation:
Yes you are doing it right. The whole point of using K-fold cross validation is because we have limited data and it ensures that every observation from the original dataset has the chance of appearing in training and test set.
Steps as you mentioned :
Split the entire data randomly into k folds (value of k shouldn’t be too small or too high, ideally we choose 5 to 10 depending on the data size).
Then fit the model using the K — 1 folds and validate the model using the remaining Kth fold. Save score and errors.
Repeat this process until every K-fold serve as the test set. Then take the average of your recorded scores. That will be the performance metric for the model.
Edit for point 1: The higher value of K leads to less biased model but large variance might lead to overfit, where as the lower value of K is similar to the train-test split approach. Hence we choose k value to be between 5 to 10. You can experiment with these values to get better performance metric.
Upvotes: 2