Reputation: 3656
I am trying to train a model using the R package caret
. My data set is rather large (600 K rows) and it all takes very long.
So far I use the below code. I do repeated cv (this is what train is about) but only 2 repeats and 5 times cv to speed it all up. For the grid I take only a rather small set of values. Nevertheless it takes hours and hours. Is there a possibility to interrupt the traning look at the results so far and continue?
short.train.ctrl = trainControl(method = "repeatedcv",repeats=2,number=5)
grid <- expand.grid(shrinkage=c(0.1), n.trees=c(500),n.minobsinnode=c(1000),interaction.depth = c(7,8,9,10))
caret.train = train(target ~.,data = data[,filter],
method = "gbm",distribution="adaboost",
tuneGrid = grid,
metric = "accuracy",
trControl =short.train.ctrl
)
Upvotes: 1
Views: 272
Reputation: 23608
Short answer: No.
Somewhat longer: No interrupts in caret. How should the program know where / when to stop?
You are doing 2 repeats 5 times, + a grid search over 500 trees with sampling of 600K records. This will take ages.
Try running everything in parallel. That should speed things up quite a bit. Of course then you might run into memory issues on your machine. But I would first run the gbm without cv (trainControl(method = "none")
) to get a feel for the time it takes to run once and take it from there.
Upvotes: 3