randomForest() machine learning in R

Question

I am exploring with the function randomforest() in R and several articles I found all suggest using a similar logic as below, where the response variable is column 30 and independent variables include everthing else except for column 30:

dat.rf <- randomForest(dat[,-30], 
                      dat[,30], 
                      proximity=TRUE, 
                      mtry=3,
                      importance=TRUE,
                      do.trace=100,
                      na.action = na.omit)

When I try this, I got the following error messages:

Error in randomForest.default(dat[, -30], dat[, 30], proximity = TRUE, : NA not permitted in predictors In addition: Warning message: In randomForest.default(dat[, -30], dat[, 30], proximity = TRUE, : The response has five or fewer unique values. Are you sure you want to do regression?

However, I was able to get it to work when I listed the independent variables one by one while keeping all the other parameters the same.

dat.rf <- randomForest(as.factor(Y) ~X1+ X2+ X3+ X4+ X5+ X6+ X7+ X8+ X9+ X10+......,                          
                      data=dat
                      proximity=TRUE,
                      mtry=3,
                      importance=TRUE,
                      do.trace=100,
                      na.action = na.omit)

Could someone help me debug the simplier command where I don't have to list each predictor one by one?

randomForest() machine learning in R

Answers (1)

Related Questions