Reputation: 4491
I'm using svm
from e1071
for a dataset like this:
sdewey <- svm(x = as.matrix(trainS),
y = trainingSmall$DEWEY,
type="C-classification")
That works just fine, but when I try to tune the cost and gamma like this:
svm_tune <- tune(svm, train.x=as.matrix(trainS), train.y=trainingSmall$DEWEY, type="C-classification", ranges=list(cost=10^(-1:6), gamma=1^(-1:1)))
I get this error:
Error in tune(svm, train.x = as.matrix(trainS), train.y = trainingSmall$DEWEY, : Dependent variable has wrong type!
The structure of my training data is this, but with many more lines:
'data.frame': 1000 obs. of 1542 variables:
$ women.prisoners : int 1 0 0 0 0 0 0 0 0 0 ...
$ reformatories.for.women : int 1 0 0 0 0 0 0 0 0 0 ...
$ women : int 1 0 0 0 0 0 0 0 0 0 ...
$ criminal.justice : int 1 0 0 0 0 0 0 0 0 0 ...
$ soccer : int 0 1 0 0 0 0 0 0 0 0 ...
$ coal.mines.and.mining : int 0 0 1 0 0 0 0 0 0 0 ...
$ coal : int 0 0 1 0 0 0 0 0 0 0 ...
$ engineering.geology : int 0 0 1 0 0 0 0 0 0 0 ...
$ family.violence : int 0 0 0 1 0 0 0 0 0 0 ...
It is a multi-class problem. I'm not sure of how I could solve this or if there are other ways of finding out the optimal value for the cost and gamma parameters.
Here is an example of my data, and trainS
is that data without the first 4 columns (DEWEY, D1, D2 and D3)
Thanks
Upvotes: 2
Views: 3900
Reputation: 23216
require(e1071)
trainingSmall<-read.csv("trainingSmallExtra.csv")
sdewey <- svm(x = as.matrix(trainingSmall[,4:nrow(trainingSmall)]),
y = trainingSmall$DEWEY,
type = "C-classification",
kernel = "linear" # same as no kernel
)
This works because svm
has automatically converted DEWEY
to a factor.
The tune
model failed because, being that it is made for user customization, it relies on you to supply the correct data type. Since DEWEY
was integer instead of factor
it failed. We can fix this:
trainingSmall$DEWEY <- as.factor(trainingSmall$DEWEY)
svm_tune <- tune(svm, train.x = as.matrix(trainingSmall[,4:nrow(trainingSmall)]),
train.y = trainingSmall$DEWEY, # the way I'm formatting your
kernel = "linear", # code is Google's R style
type = "C-classification",
ranges = list(
cost = 10^(-1:6),
gamma = 1^(-1:1)
)
)
Upvotes: 1