Eve Edomenko
Eve Edomenko

Reputation: 510

Usage of nested cross validation for different regressors

I'm working on an assignment in which I have to compare two regressors (random forest and svr) that I implement with scikit-learn. I want to evaluate both regressors and I was googling around a lot and came across the nested cross validation where you use the inner loop to tune the hyperparameter and the outer loop to validate on the k folds of the training set. I would like to use the inner loop to tune both my regressors and the outer loop to validate both so I will have the same test and train folds for both rergessors.
Is this a proper way to compare two ml algorithms with each other? Are there better ways to compare two algorithms with each other? Especialy regressors?

I found some entries in blogs but I could not find any scientific paper stating this is a good technique to compare two algorithms with each other, which would be important to me. If there are some links to current papers I would be glad if you could post them, too. Thanks for the help in advance!

EDIT
I have a very low amount of data (apprx. 200 samples) with a high amount of features (apprx. 250, after using feature selection, otherwise about 4500) so I decided to use cross validation.My dependent variable is a continous value from 0 to 1. The problem is a recommender problem so it makes no sense to test for accuracy in this case. As this is only an assignment I can only measure the ml algorithms with statistical methods rather than asking users for their opinion or measure the purchases done by them.

Upvotes: 0

Views: 272

Answers (1)

tjiagoM
tjiagoM

Reputation: 446

I think it depends on what you want to compare. If you just want to compare different models with regards to prediction power (classifier and regressor alike), nested cross validation is usually good in order to not report overly optimistic metrics: https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html while allowing you to find the best set of hyperparameters.

However, sometimes it seems like is just overkilling: https://arxiv.org/abs/1809.09446

Also, depending on how the ml algorithms behave, what datasets are you talking about, their characteristics, etc etc, maybe your "comparison" might need to take into consideration a lot of other things rather than just prediction power. Maybe if you give some more details we will be able to help more.

Upvotes: 1

Related Questions