Najme Rastegar
Najme Rastegar

Reputation: 31

Subscript out of bound error in predict function of randomforest

I am using random forest for prediction and in the predict(fit, test_feature) line, I get the following error. Can someone help me to overcome this. I did the same steps with another dataset and had no error. but I get error here.

Error: Error in x[, vname, drop = FALSE] : subscript out of bounds

training_index <- createDataPartition(shufflled[,487], p = 0.8, times = 1)
training_index <- unlist(training_index)

train_set <- shufflled[training_index,]
test_set <- shufflled[-training_index,]

accuracies<- c()
k=10
n= floor(nrow(train_set)/k)

for(i in 1:k){
  sub1<- ((i-1)*n+1)
  sub2<- (i*n)
  subset<- sub1:sub2
  train<- train_set[-subset, ]
  test<- train_set[subset, ]
  test_feature<- test[ ,-487]

  True_Label<- as.factor(test[ ,487])
  fit<- randomForest(x= train[ ,-487], y= as.factor(train[ ,487]))

  prediction<- predict(fit, test_feature)  #The error line
  correctlabel<- prediction == True_Label
  t<- table(prediction, True_Label)
}

Upvotes: 0

Views: 5240

Answers (4)

lux
lux

Reputation: 1

Add the expression

dimnames(test_feature) <- NULL

before

prediction <- predict(fit, test_feature)

Upvotes: -1

phoebe
phoebe

Reputation: 11

Are there identical column names in your training and validation x?

I had the same error message and solved it by renaming my column names because my data was a matrix and their colnames were all empty, i.e. "".

Upvotes: 1

Loncar
Loncar

Reputation: 127

I had similar problem few weeks ago.

To go around the problem, you can do this:

df$label <- factor(df$label)

Instead of as.factor try just factor generic function. Also, try first naming your label variable.

Upvotes: 1

Lorenzo Benassi
Lorenzo Benassi

Reputation: 621

Your question is not very clear, anyway I try to help you. First of all check your data to see the distribution in levels of your various predictors and outcomes. You may find that some of your predictor levels or outcome levels are very highly skewed, or some outcomes or predictor levels are very rare. I got that error when I was trying to predict a very rare outcome with a heavily tuned random forest, and so some of the predictor levels were not actually in the training data. Thus a factor level appears in the test data that the training data thinks is out of bounds.

Alternatively, check the names of your variables. Before calling predict() to make sure that the variable names match. Without your data files, it's hard to tell why your first example worked. For example You can try:

names(test) <- names(train)

Upvotes: 0

Related Questions