Wrong prediction in linear SVM

Question

I am writing a R script which when run gives the predicted value of dependent variable. All of my variables are categorically divided (as shown in picture) and assigned a number, total number of classes are 101. (each class is song name).

So I have a training dataset which contains pairs like {(2,5,6,1)82, (2,5,6,1)45, (2,5,3,1)34, ...}. I trained this dataset using linear svm in R studio and for some values of given (x,y,z,w) it gives correct answers. but even though records like (2,5,6,1)X existed in training dataset, why it doesn't predict values 82 or 45? I am pretty confused as it neglects this terms and shows whole new output 23.

training_set = dataset;
library(e1071)
classifier = svm(formula = Song ~ .,
             data = training_set,
             type = 'C-classification',
             kernel = 'linear')
y_pred = predict(classifier, data.frame(Emotion = 2, Pact = 5, Mact = 6, Session = 1)).

What I want is my answer to come closest. What can I do for achieving these goals?

Get atleast 10 closest outcomes instead of 1 in R.
Is linear svm model doing good here?
How do I get value 82,45 like in training dataset, if no entry present then find the closest one. (Is there any model without going for simply euclidean distance)?

Maurits Evers · Accepted Answer

What makes you think that your classifier will predict the same outcome for a set of predictors as your original observation? I think there might be some fundamental misconceptions about how classification works.

Here is a simple counter-example using a linear regression model. The same principle applies to your SVM.

Simulate some data

set.seed(2017);
x <- seq(1:10);
y <- x + rnorm(10);

We now modify one value of y and show the data of (x,y) pairs.

y[3] = -10;
df <- cbind.data.frame(x = x, y = y);
df;
#    x          y
#1   1   2.434201
#2   2   1.922708
#3   3 -10.000000
#4   4   2.241395
#5   5   4.930175
#6   6   6.451906
#7   7   5.041634
#8   8   7.998476
#9   9   8.734664
#10 10  11.563223

Fit a model and get predictions.

fit <- lm(y ~ x, data = df);
pred <- predict(fit);

Let's take a look at predicted responses y.pred and compare them to the original data (x, y).

data.frame(df, y.pred = pred)
#    x          y     y.pred
#1   1   2.434201 -2.1343357
#2   2   1.922708 -0.7418526
#3   3 -10.000000  0.6506304
#4   4   2.241395  2.0431135
#5   5   4.930175  3.4355966
#6   6   6.451906  4.8280796
#7   7   5.041634  6.2205627
#8   8   7.998476  7.6130458
#9   9   8.734664  9.0055288
#10 10  11.563223 10.3980119

Note how the predicted response for x=3 is y.pred=0.65 even though you observed y=-10.

Wrong prediction in linear SVM

Answers (1)

Related Questions