Reputation: 95
I'm trying to create a dynamic ML app that allows the user to upload a dataset to get a prediction of the first column in the dataset, using a random forest model.
I am having problems with the randomforest()
function, specifically when I try to specifying the response variable as the first column of the dataset. For the example below, I use the iris dataset and I've moved the response variable, Species, to be positioned in the first column.
This was my attempt:
model <- randomForest(names(DATA[1]) ~ ., data = DATA, ntree = 500, mtry = 3, importance = TRUE)
However, this does not work. The error I get is:
Error: variable lengths differ (found for 'Species')
The app and function only seems to work when I specify the response variable manually like this:
model <- randomForest(Species ~ ., data = DATA, ntree = 500, mtry = 3, importance = TRUE)
I have tried to use the paste()
function to work some magic, but I didn't succed.
How should I write the code in order to get it to work?
Upvotes: 0
Views: 116
Reputation: 33782
It looks like you want to build a formula from a string. You can use eval
and parse
to do that. Something like this should work:
model <- randomForest(eval(parse(text = paste(names(DATA)[1], "~ ."))),
data = DATA, ntree = 500, mtry = 3, importance = TRUE)
Example using original iris dataset:
model <- randomForest(eval(parse(text = paste(names(iris)[5], "~ ."))),
data = iris, ntree = 500, mtry = 3, importance = TRUE)
model
Call:
randomForest(formula = eval(parse(text = paste(names(iris)[5], "~ ."))), data = iris,
ntree = 500, mtry = 3, importance = TRUE)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3
OOB estimate of error rate: 4%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 3 47 0.06
Upvotes: 2