moondaisy
moondaisy

Reputation: 4491

Tune SVM in R - Dependent variable has wrong type

I'm using svm from e1071 for a dataset like this:

sdewey <- svm(x = as.matrix(trainS), 
              y = trainingSmall$DEWEY,
              type="C-classification")

That works just fine, but when I try to tune the cost and gamma like this:

svm_tune <- tune(svm, train.x=as.matrix(trainS), train.y=trainingSmall$DEWEY, type="C-classification",    ranges=list(cost=10^(-1:6), gamma=1^(-1:1)))

I get this error:

Error in tune(svm, train.x = as.matrix(trainS), train.y = trainingSmall$DEWEY, : Dependent variable has wrong type!

The structure of my training data is this, but with many more lines:

'data.frame':   1000 obs. of  1542 variables:
 $ women.prisoners                                  : int  1 0 0 0 0 0 0 0 0 0 ...
 $ reformatories.for.women                          : int  1 0 0 0 0 0 0 0 0 0 ...
 $ women                                            : int  1 0 0 0 0 0 0 0 0 0 ...
 $ criminal.justice                                 : int  1 0 0 0 0 0 0 0 0 0 ...
 $ soccer                                           : int  0 1 0 0 0 0 0 0 0 0 ...
 $ coal.mines.and.mining                            : int  0 0 1 0 0 0 0 0 0 0 ...
 $ coal                                             : int  0 0 1 0 0 0 0 0 0 0 ...
 $ engineering.geology                              : int  0 0 1 0 0 0 0 0 0 0 ...
 $ family.violence                                  : int  0 0 0 1 0 0 0 0 0 0 ...

It is a multi-class problem. I'm not sure of how I could solve this or if there are other ways of finding out the optimal value for the cost and gamma parameters.

Here is an example of my data, and trainS is that data without the first 4 columns (DEWEY, D1, D2 and D3)

Thanks

Upvotes: 2

Views: 3900

Answers (1)

Hack-R
Hack-R

Reputation: 23216

require(e1071)

trainingSmall<-read.csv("trainingSmallExtra.csv")

sdewey <- svm(x      = as.matrix(trainingSmall[,4:nrow(trainingSmall)]), 
              y      = trainingSmall$DEWEY,
              type   = "C-classification",
              kernel = "linear" # same as no kernel
              )

This works because svm has automatically converted DEWEY to a factor.

The tune model failed because, being that it is made for user customization, it relies on you to supply the correct data type. Since DEWEY was integer instead of factor it failed. We can fix this:

trainingSmall$DEWEY <- as.factor(trainingSmall$DEWEY)

svm_tune <- tune(svm, train.x = as.matrix(trainingSmall[,4:nrow(trainingSmall)]), 
                      train.y = trainingSmall$DEWEY, # the way I'm formatting your  
                      kernel  = "linear",            # code is Google's R style
                      type    = "C-classification",    
                      ranges  = list(
                                      cost  = 10^(-1:6), 
                                      gamma =  1^(-1:1)
                                    )
                 )

Upvotes: 1

Related Questions