Fluxy
Fluxy

Reputation: 2978

Random Forest does not accept feature names

I am trying to train a random forest, but have issues with the naming of variables:

library("randomForest")

f <- "~ var1_testTRUE + var2_root_subj. + var3_test.en-US"
rf <- randomForest(as.formula(f), data=dtrain, ntree=10, nodesize=10)

This is the error message:

Error in eval(predvars, data, env) : objeto 'var3_test.en' no encontrado

It's not clear to me why -US is not appended to the feature name.

How to fix it?

Upvotes: 0

Views: 119

Answers (1)

dave-edison
dave-edison

Reputation: 3726

var3_test.en-US is a non-syntactic name, so you need to surround it with backticks. You can see that as written your formula isn't being parsed how you want:

as.formula("~ var1_testTRUE + var2_root_subj. + var3_test.en-US")
# ~var1_testTRUE + var2_root_subj. + var3_test.en - US

With backticks it gets parsed correctly:

as.formula("~ var1_testTRUE + var2_root_subj. + `var3_test.en-US`")
# ~var1_testTRUE + var2_root_subj. + `var3_test.en-US`

Upvotes: 1

Related Questions