Kamesh
Kamesh

Reputation: 9

Random forest model in r

Is there any way, where we can create multiple random forest models by fine-tuning hyper parameters on train data and check the test data performance against all models and store it in a csv file?

For ex:- i have one model with mtry is 6, nodesize is 3, and another model where mtryis 10 and nodesize is 4 What i need to do is to test these two models performance on test data and store the key model metrics like confusion matrix, sensitivity, and specificity.

i have tried the following code

train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'sensitivity'=0,'specificity'=0)
modellist <- list()

for (mtry in c(6,11)){
  for (nodesize in c(2,3)){
    fit_model <- randomForest(dv~., train_final,mtry = mtry, importance=TRUE, nodesize=nodesize,
                                sampsize = ceiling(.8*nrow(train_final)), proximity=TRUE,na.action = na.omit,
                            ntree=500)
      Key_col <- paste0(mtry,"-",nodesize)
      modellist[[Key_col]] <- fit_model

      pred_train <- predict(fit_model, train_final)
      cf <- confusionMatrix(pred_train, train_final$DV, mode = 'everything', positive = '1')
      train_performance$TN <- cf$table[1]
      train_performance$FP <- cf$table[2]
      train_performance$FN <- cf$table[3]
      train_performance$TP <- cf$table[4]
      train_performance$accuracy=cf$overall[1]
      train_performance$kappa=cf$overall[2]
      train_performance$sensitivity=cf$byClass[1]
      train_performance$specificity=cf$byClass[2]
      train_performance$key=Key_col
    }
  }

Upvotes: 0

Views: 614

Answers (1)

ashwin agrawal
ashwin agrawal

Reputation: 1611

Below is sample method using caret package on how to tune and train your random forest model which outputs accuracy parameters for all models:

library(randomForest)
library(mlbench)
library(caret)

# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]
# Create model with default paramters
control <- trainControl(method="repeatedcv", number=10, repeats=3)
seed <- 7
metric <- "Accuracy"
set.seed(seed)
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_default)

output:

Resampling results

  Accuracy   Kappa      Accuracy SD  Kappa SD 
  0.8138384  0.6209924  0.0747572    0.1569159

Tune Using Caret:

Random Search: One search strategy that we can use is to try random values within a range.

# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)

output:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
  11    0.8218470  0.6365181  0.09124610   0.1906693
  14    0.8140620  0.6215867  0.08475785   0.1750848
  17    0.8030231  0.5990734  0.09595988   0.1986971
  24    0.8042929  0.6002362  0.09847815   0.2053314
  30    0.7933333  0.5798250  0.09110171   0.1879681
  34    0.8015873  0.5970248  0.07931664   0.1621170
  45    0.7932612  0.5796828  0.09195386   0.1887363
  47    0.7903896  0.5738230  0.10325010   0.2123314
  49    0.7867532  0.5673879  0.09256912   0.1899197
  50    0.7775397  0.5483207  0.10118502   0.2063198
  60    0.7790476  0.5513705  0.09810647   0.2005012

enter image description here

Grid Search: Another search is to define a grid of algorithm parameters to try.

control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)

output:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
   1    0.8377273  0.6688712  0.07154794   0.1507990
   2    0.8378932  0.6693593  0.07185686   0.1513988
   3    0.8314502  0.6564856  0.08191277   0.1700197
   4    0.8249567  0.6435956  0.07653933   0.1590840
   5    0.8268470  0.6472114  0.06787878   0.1418983
   6    0.8298701  0.6537667  0.07968069   0.1654484
   7    0.8282035  0.6493708  0.07492042   0.1584772
   8    0.8232828  0.6396484  0.07468091   0.1571185
   9    0.8268398  0.6476575  0.07355522   0.1529670
  10    0.8204906  0.6346991  0.08499469   0.1756645
  11    0.8073304  0.6071477  0.09882638   0.2055589
  12    0.8184488  0.6299098  0.09038264   0.1884499
  13    0.8093795  0.6119327  0.08788302   0.1821910
  14    0.8186797  0.6304113  0.08178957   0.1715189
  15    0.8168615  0.6265481  0.10074984   0.2091663

enter image description here

There are many other methods to tune your random forest model and store the results of these models, above two are the most widely used methods.

Moreover, you can also manually set these parameters up and train and tune the model.

Upvotes: 1

Related Questions