Shahzad
Shahzad

Reputation: 2049

SVM Feature Selection using SCAD

Using penalizedSVM R package, I am trying to do feature selection. There is a list of several data.frames called trainingdata.

trainingdata <-lapply(trainingdata, function(data)
                   {
                     levels(data$label) <- c(-1, 1)
                     train_x<-data[, -1]
                     train_x<-data.matrix(train_x)
                     trainy<-data[, 1]
                     print(which(!is.finite(train_x)))
                     scad.fix<-svm.fs(train_x, y=trainy, fs.method="scad",
                                      cross.outer=0, grid.search="discrete",
                                      lambda1.set=lambda1.scad, parms.coding="none",
                                      show="none", maxIter=1000, inner.val.method="cv",
                                      cross.inner=5, seed=seed, verbose=FALSE)

                     data <- data[c(1, scad.fix$model$xind)]
                     data
                   })

Some iterations go well but then on one data.frame I am getting the following error message.

[1] "feature selection method is scad"
Error in svd(m, nv = 0, nu = 0) : infinite or missing values in 'x'
Calls: lapply ... scadsvc -> .calc.mult.inv_Q_mat2 -> rank.condition -> svd

Using the following call, I am also checking whether x is really infinite but the call returns 0 for all preceding and the current data.frame where the error has occurred.

print(which(!is.finite(train_x)))

Is there any other way to check for infinite values? What else could be done to rectify this error? Is there any way that one can determine the index of the current data.frame being processed within lapply?

Upvotes: 1

Views: 1069

Answers (1)

agstudy
agstudy

Reputation: 121568

For the first question , infinite or missing values in 'x' suggests that you change your condition to something like .

   idx <- is.na(train_x) | is.infinite(train_x)

You can assign 0 for example to theses values.

   train_x[idx] <- 0

For the second question , concerning how to get the names of current data.frame within lapply you can loop over the names of data.farmes, and do something like this :

 lapply(names(trainingdata), function(data){ data <- trainingdata[data]....}

For example:

 ll <- list(f=1,c=2)
> lapply(names(list(f=1,c=2)), function(x) data <- ll[x])
[[1]]
[[1]]$f
[1] 1


[[2]]
[[2]]$c
[1] 2

EDIT

You can use tryCatch before this line scad.fix<-svm.fs

   tryCatch(
    scad.fix<-svm.fs(....)
      , error = function(e) e)
           })

for example, here I test it on this list, the code continues to be executing to the end of list ,even there is a NA in the list.

lapply(list(1,NA,2), function(x){
  tryCatch(
  if (any(!is.finite(x)))
     stop("infinite or missing values in 'x'")
  , error = function(e) e)
       })

Upvotes: 1

Related Questions