minigeek
minigeek

Reputation: 3166

Wrong prediction in linear SVM

I am writing a R script which when run gives the predicted value of dependent variable. All of my variables are categorically divided (as shown in picture) and assigned a number, total number of classes are 101. (each class is song name). Training dataset

So I have a training dataset which contains pairs like {(2,5,6,1)82, (2,5,6,1)45, (2,5,3,1)34, ...}. I trained this dataset using linear svm in R studio and for some values of given (x,y,z,w) it gives correct answers. but even though records like (2,5,6,1)X existed in training dataset, why it doesn't predict values 82 or 45? I am pretty confused as it neglects this terms and shows whole new output 23.

training_set = dataset;
library(e1071)
classifier = svm(formula = Song ~ .,
             data = training_set,
             type = 'C-classification',
             kernel = 'linear')
y_pred = predict(classifier, data.frame(Emotion = 2, Pact = 5, Mact = 6, Session = 1)).

What I want is my answer to come closest. What can I do for achieving these goals?

  1. Get atleast 10 closest outcomes instead of 1 in R.
  2. Is linear svm model doing good here?
  3. How do I get value 82,45 like in training dataset, if no entry present then find the closest one. (Is there any model without going for simply euclidean distance)?

Upvotes: 2

Views: 497

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50738

What makes you think that your classifier will predict the same outcome for a set of predictors as your original observation? I think there might be some fundamental misconceptions about how classification works.

Here is a simple counter-example using a linear regression model. The same principle applies to your SVM.

  1. Simulate some data

    set.seed(2017);
    x <- seq(1:10);
    y <- x + rnorm(10);
    
  2. We now modify one value of y and show the data of (x,y) pairs.

    y[3] = -10;
    df <- cbind.data.frame(x = x, y = y);
    df;
    #    x          y
    #1   1   2.434201
    #2   2   1.922708
    #3   3 -10.000000
    #4   4   2.241395
    #5   5   4.930175
    #6   6   6.451906
    #7   7   5.041634
    #8   8   7.998476
    #9   9   8.734664
    #10 10  11.563223
    
  3. Fit a model and get predictions.

    fit <- lm(y ~ x, data = df);
    pred <- predict(fit);
    
  4. Let's take a look at predicted responses y.pred and compare them to the original data (x, y).

    data.frame(df, y.pred = pred)
    #    x          y     y.pred
    #1   1   2.434201 -2.1343357
    #2   2   1.922708 -0.7418526
    #3   3 -10.000000  0.6506304
    #4   4   2.241395  2.0431135
    #5   5   4.930175  3.4355966
    #6   6   6.451906  4.8280796
    #7   7   5.041634  6.2205627
    #8   8   7.998476  7.6130458
    #9   9   8.734664  9.0055288
    #10 10  11.563223 10.3980119
    

Note how the predicted response for x=3 is y.pred=0.65 even though you observed y=-10.

Upvotes: 1

Related Questions