Richi W
Richi W

Reputation: 3656

Intermediate analyses of train in caret package

I am trying to train a model using the R package caret. My data set is rather large (600 K rows) and it all takes very long.

So far I use the below code. I do repeated cv (this is what train is about) but only 2 repeats and 5 times cv to speed it all up. For the grid I take only a rather small set of values. Nevertheless it takes hours and hours. Is there a possibility to interrupt the traning look at the results so far and continue?

short.train.ctrl  = trainControl(method = "repeatedcv",repeats=2,number=5)
grid <- expand.grid(shrinkage=c(0.1), n.trees=c(500),n.minobsinnode=c(1000),interaction.depth = c(7,8,9,10))
caret.train = train(target ~.,data = data[,filter],
                    method = "gbm",distribution="adaboost",
                    tuneGrid = grid,
                    metric = "accuracy",
                    trControl =short.train.ctrl
)

Upvotes: 1

Views: 272

Answers (1)

phiver
phiver

Reputation: 23608

Short answer: No.

Somewhat longer: No interrupts in caret. How should the program know where / when to stop?

You are doing 2 repeats 5 times, + a grid search over 500 trees with sampling of 600K records. This will take ages.

Try running everything in parallel. That should speed things up quite a bit. Of course then you might run into memory issues on your machine. But I would first run the gbm without cv (trainControl(method = "none")) to get a feel for the time it takes to run once and take it from there.

Upvotes: 3

Related Questions