Sound
Sound

Reputation: 95

Dynamic response variable for random forest

I'm trying to create a dynamic ML app that allows the user to upload a dataset to get a prediction of the first column in the dataset, using a random forest model.

I am having problems with the randomforest() function, specifically when I try to specifying the response variable as the first column of the dataset. For the example below, I use the iris dataset and I've moved the response variable, Species, to be positioned in the first column.

This was my attempt:

model <- randomForest(names(DATA[1]) ~ ., data = DATA, ntree = 500, mtry = 3, importance = TRUE)

However, this does not work. The error I get is:

Error: variable lengths differ (found for 'Species')

The app and function only seems to work when I specify the response variable manually like this:

model <- randomForest(Species ~ ., data = DATA, ntree = 500, mtry = 3, importance = TRUE)

I have tried to use the paste() function to work some magic, but I didn't succed.

How should I write the code in order to get it to work?

Upvotes: 0

Views: 116

Answers (1)

neilfws
neilfws

Reputation: 33782

It looks like you want to build a formula from a string. You can use eval and parse to do that. Something like this should work:

model <- randomForest(eval(parse(text = paste(names(DATA)[1], "~ ."))), 
                      data = DATA, ntree = 500, mtry = 3, importance = TRUE)

Example using original iris dataset:

model <- randomForest(eval(parse(text = paste(names(iris)[5], "~ ."))), 
                      data = iris, ntree = 500, mtry = 3, importance = TRUE)

model

Call:
 randomForest(formula = eval(parse(text = paste(names(iris)[5],      "~ ."))), data = iris, 
              ntree = 500, mtry = 3, importance = TRUE) 
           Type of random forest: classification
                 Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 4%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          3        47        0.06

Upvotes: 2

Related Questions