Reputation: 291
I've been studying cross-validation and as far as I'm concerned, the thing about k-fold validation is that you would evaluate your model in different slices of your dataset and then average the error.
For instance, I would get my dataset and split into 3 parts. First, I would run a model that would train with parts 2 and 3 and test in part 1. Then, I would train with parts 1 and 3, and test in part 2. Last, I would train with parts 1 and 2 and test in part 3.
If I'm talking about a Lasso Regression, for instance, each of the 3 models I ran (as the description above) would have a group of coefficients (betas). That's it, it's like I have 3 models, but I actually was evaluating my choice of hyperparameters and other treatments.
Now, let's move to Scikit-Learn cross validation. Let's see the image below from their websites:
(sorry about the image) It says "finding parameter" in the cross-validation. How would it find different parameters and then test in a final set? I mean, each model would have a different parameter.
Upvotes: 2
Views: 300
Reputation: 902
I think the "finding parameters" part is just talking about finding the scores when trained and evaluated on each split, which help inform you about how well the model performs. You would do this with different hyperparameter choices and then you'd eventually choose the model with the best performance when doing K-folds. Then the hyperparameter of the best model should be used to retrain on all the training data to create your final model that you'll test.
Upvotes: 0