CARET. Relationship between data splitting and trainControl

Question

I have carefully read the CARET documentation at: http://caret.r-forge.r-project.org/training.html, the vignettes, and everything is quite clear (the examples on the website help a lot!), but I am still a confused about the relationship between two arguments to trainControl:

method 
index

and the interplay between trainControl and the data splitting functions in caret (e.g. createDataPartition, createResample, createFolds and createMultiFolds)

To better frame my questions, let me use the following example from the documentation:

data(BloodBrain)
set.seed(1)
tmp <- createDataPartition(logBBB,p = .8, times = 100)
trControl = trainControl(method = "LGOCV", index = tmp)
ctreeFit <- train(bbbDescr, logBBB, "ctree",trControl=trControl)

My questions are:

If I use createDataPartition (which I assume that does stratified bootstrapping), as in the above example, and I pass the result as index to trainControl do I need to use LGOCV as the method in my call trainControl? If I use another one (e.g. cv) What difference would it make? In my head, once you fix index, you are essentially choosing the type of cross-validation, so I am not sure what role method plays if you use index.
What is the difference between createDataPartition and createResample? Is it that createDataPartition does stratified bootstrapping, while createResample doesn't?

3) How can I do stratified k-fold (e.g. 10 fold) cross validation using caret? Would the following do it?

tmp <- createFolds(logBBB, k=10, list=TRUE,  times = 100)
trControl = trainControl(method = "cv", index = tmp)
ctreeFit <- train(bbbDescr, logBBB, "ctree",trControl=trControl)

CARET. Relationship between data splitting and trainControl

Answers (1)

Related Questions