Run2
Run2

Reputation: 1919

svm conceptual query

I have some basic conceptual queries on SVM - it will be great if any one can guide me on this. I have been studying books and lectures for a while but have not been able to get answers for these queries correctly

  1. Suppose I have m featured data points - m > 2. How will I know if the data points are linearly separable or not?. If I have understood correctly, linearly separable data points - will not need any special kernel for finding the hyper plane as there is no need to increase the dimension.

  2. Say, I am not sure whether the data is linearly separable or not. I try to get a hyper plane with linear kernel, once with slackness and once without slackness on the lagrange multipliers. What difference will I see on the error rates on training and test data for these two hyper planes. If I understood correctly, if the data is not linearly separable, and if I am not using slackness then there cannot be any optimal plane. If that is the case, should the svm algorithm give me different hyper planes on different runs. Now when I introduce slackness - should I always get the same hyper plane, every run ? And how exactly can I find out from the lagrange multipliers of a hyper plane, whether the data was linearly separable or not.

  3. Now say from 2 I came to know somehow that the data was not linearly separable at m dimensions. So I will try to increase the dimensions and see if it is separable at a higher dimension. How do I know how high I will need to go ? I know the calculations do not go into that space - but is there any way to find out from 2 what should be the best kernel for 3 (i.e I want to find a linearly separating hyper plane).

  4. What is the best way to visualize hyper planes and data points in Matlab where the feature dimensions can be as big as 60 - and the hyperplane is at > 100 dimensions (i,e data points in few hundreds and using Gaussian Kernels the feature vector changes to > 100 dimensions).

I will really appreciate if someone clears these doubts Regards

Upvotes: 1

Views: 223

Answers (1)

carlosdc
carlosdc

Reputation: 12142

I'm going to try to focus on your questions (1), (2) and (3). In practice the most important concern is not if the problem becomes linearly separable but how well the classifier performs on unseen data (i.e. how well it classifies). It seems you want to find a good kernel for which data is linearly separable, and you will always be able to do this (consider putting at each training point an extremely narrow gaussian RBF), but what you really want is good performance on unseen data. That being said:

  • If the problem is not linearly separable and not using slacks the optimization will fail. It depends on the implementation and the specific optimization algorithm how it fails, does it not converge?, does it not find a descent direction? does it run into numerical difficulties? Even if you wanted to determine cases with slacks, you can still run into numerical difficulties that would make and that alone would make your algorithm of linear separability unreliable
  • How high do you need to go? Well that is a fundamental question. It is called the problem of data representation. For straight forward solutions people use held out data (people don't care about linear separability they care about good performance on held out data) and do parameter search (for example an RBF kernel can is strictly more expressive than a linear kernel) under the correct gammas. So the problem becomes finding a good gamma for your data. See for example this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.880
  • I don't think there is a trivial connection between the values of the lagrangian multipliers and linear separability. You can try an high alphas whose value is C, but I'm not sure you'll be able to say much.

Upvotes: 1

Related Questions