learner
learner

Reputation: 2742

How to handle variable names that starts with a number in randomForest Package

In the toy example below, I converted the variable name cyl to 1_cyl. I am doing this as in my actual data there are some variables that starts with a number. I am applying randomForest using that formula but I am getting the error shown below. I see that another functions work perfect with the same formula.

How can I sove this problem?

data(mtcars)
colnames(mtcars)[2] = '1_cyl'
colnames(mtcars)
#[1] "mpg"   "1_cyl" "disp"  "hp"    "drat"  "wt"    "qsec"  "vs"    "am"    "gear"  "carb" ]
(fmla <- as.formula(paste("mpg ~ `1_cyl`+hp ")) )
randomForest(fmla,  dat=mtcars,importance=T,na.action=na.exclude)

#> randomForest(fmla,  dat=mtcars,importance=T,na.action=na.exclude)
#Error in eval(expr, envir, enclos) : object '1_cyl' not found

#Another functions works!!!
rpart(fmla, dat=mtcars)
glm (fmla, dat=mtcars)

Upvotes: 0

Views: 4105

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57686

randomForest.formula has a call inside it to reformulate, for some reason, and it looks like that function doesn't like nonstandard names. (It's also calling model.frame twice.)

You can get around this by calling randomForest without a formula, but with a model matrix and response variable. When you use a formula this is what happens anyway; randomForest.formula is just a convenience wrapper that builds the model matrix for you.

randomForest(mtcars[, -1], mtcars[, 1])

Upvotes: 2

Related Questions