Jake
Jake

Reputation: 453

R SVM return NA for predictions with missing data

I am attempting to make predictions using a trained SVM from package e1071 but my data contains some missing values (NA).

I would like the returned predictions to be NA when that instance has any missing values. I tried to use na.action = na.pass as below but it gives me an error "Error in names(ret2) <- rowns : 'names' attribute [150] must be the same length as the vector [149]".

If I use na.omit then I can get predictions without instances with missing data. How can I get predictions including NAs?

library(e1071)
model <- svm(Species ~ ., data = iris)
print(length(predict(model, iris)))
tmp <- iris
tmp[1, "Sepal.Length"] <- NA
print(length(predict(model, tmp, na.action = na.pass)))

Upvotes: 6

Views: 3322

Answers (3)

Brian D
Brian D

Reputation: 2719

You can take advantage of the fact that the predict output includes the original row numbers in the names() attribute:

tmp[names(predict(model,tmp)),"predict"] = predict(model,tmp)

Upvotes: 0

thelatemail
thelatemail

Reputation: 93813

You could just assign all the valid cases back to a prediction variable in the tmp set:

tmp[complete.cases(tmp), "predict"] <- predict(model, newdata=tmp[complete.cases(tmp),]) 
tmp

#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species    predict
#1             NA         3.5          1.4         0.2     setosa       <NA>
#2            4.9         3.0          1.4         0.2     setosa     setosa
#3            4.7         3.2          1.3         0.2     setosa     setosa
# ...

Upvotes: 1

Derek Corcoran
Derek Corcoran

Reputation: 4082

if you are familiar with the caret package, where you can use 233 different types of models to fit (Including SVM from package e1071), in the section called "models clustered by tag similarity" there you can find a csv with the data they used to group the algorithms.

There is a column there called Handle Missing Predictor Data, which tells you which algorithms can do what you want. Unfortunately SVM is not included there, but these algorithms are:

  • Boosted Classification Trees (ada)
  • Bagged AdaBoost (AdaBag)
  • AdaBoost.M1 (AdaBoost.M1)
  • C5.0 (C5.0)
  • Cost-Sensitive C5.0 (C5.0Cost)
  • Single C5.0 Ruleset (C5.0Rules)
  • Single C5.0 Tree (C5.0Tree)
  • CART (rpart)
  • CART (rpart1SE)
  • CART (rpart2)
  • Cost-Sensitive CART (rpartCost)
  • CART or Ordinal Responses (rpartScore)

If you still insist on using SVM, you could use the knnImpute option in the preProccess function from the same package, that should allow you to predict for all your observations.

Upvotes: 4

Related Questions