Reputation: 334
I've built some models in caret and am trying to assess which model is fastest at making predictions on the unseen data so I have an idea of which model is fastest at making predictions when deployed. I know I can extract the training time from a caret model using modelname$times
, however, I want to assess how quick the model is at making predictions. I'm a little unsure of the best approach and at present, present, I have done the following:
# model 1
S_time <- Sys.time()
preds <- modelname %>% predict(testdata)
E_time <- Sys.time()
pred_time1 <- E_time - S_time
# model 2
S_time <- Sys.time()
preds <- modelname %>% predict(testdata)
E_time <- Sys.time()
pred_time2 <- E_time - S_time
I can then make a direct comparison between the times and assess how quick a model is at making prediction on unseen data. I can then use this prediction time as an additional model parameter when evaluating the best model (alongside the various other metrics such as AIC score, RMSE, MAE, R2 etc). Is there another way of doing this or is this the correct approach?
Upvotes: 0
Views: 221
Reputation: 47008
You can use microbenchmark
and basically write a function with the model as an input, for example if we create 2 models:
library(microbenchmark)
library(caret)
library(magrittr)
data(cars)
idx = sample(nrow(cars),500)
traindata = cars[idx,]
testdata = cars[-idx,]
model1 = train(Price ~ .,data=traindata,method="gbm",
trControl=trainControl(method="cv"))
model2 = train(Price ~ .,data=traindata,method="rf",
trControl=trainControl(method="cv"))
Then have a function that takes in a model and testdata:
predict_func = function(mdl,data){
mdl %>% predict(data)
}
We can feed it into microbenchmark()
:
microbenchmark(model1=predict_func(model1,testdata),model2=predict_func(model2,testdata))
Unit: milliseconds
expr min lq mean median uq max neval cld
model1 2.394427 2.507652 2.944221 2.710818 2.900977 9.967533 100 a
model2 10.281856 12.174207 13.716903 13.121166 14.651236 23.971576 100 b
Upvotes: 1