Reputation: 1919
I have some basic conceptual queries on SVM - it will be great if any one can guide me on this. I have been studying books and lectures for a while but have not been able to get answers for these queries correctly
Suppose I have m featured data points - m > 2. How will I know if the data points are linearly separable or not?. If I have understood correctly, linearly separable data points - will not need any special kernel for finding the hyper plane as there is no need to increase the dimension.
Say, I am not sure whether the data is linearly separable or not. I try to get a hyper plane with linear kernel, once with slackness and once without slackness on the lagrange multipliers. What difference will I see on the error rates on training and test data for these two hyper planes. If I understood correctly, if the data is not linearly separable, and if I am not using slackness then there cannot be any optimal plane. If that is the case, should the svm algorithm give me different hyper planes on different runs. Now when I introduce slackness - should I always get the same hyper plane, every run ? And how exactly can I find out from the lagrange multipliers of a hyper plane, whether the data was linearly separable or not.
Now say from 2 I came to know somehow that the data was not linearly separable at m dimensions. So I will try to increase the dimensions and see if it is separable at a higher dimension. How do I know how high I will need to go ? I know the calculations do not go into that space - but is there any way to find out from 2 what should be the best kernel for 3 (i.e I want to find a linearly separating hyper plane).
What is the best way to visualize hyper planes and data points in Matlab where the feature dimensions can be as big as 60 - and the hyperplane is at > 100 dimensions (i,e data points in few hundreds and using Gaussian Kernels the feature vector changes to > 100 dimensions).
I will really appreciate if someone clears these doubts Regards
Upvotes: 1
Views: 223
Reputation: 12142
I'm going to try to focus on your questions (1), (2) and (3). In practice the most important concern is not if the problem becomes linearly separable but how well the classifier performs on unseen data (i.e. how well it classifies). It seems you want to find a good kernel for which data is linearly separable, and you will always be able to do this (consider putting at each training point an extremely narrow gaussian RBF), but what you really want is good performance on unseen data. That being said:
Upvotes: 1