user3625340
user3625340

Reputation: 43

What is the difference between hyperplane and plane ? and why is hyperplane represented using eqn w^T+b=0 ?

I need a brief idea about SVM.So guys plz help to make me understand the representation of a hyperplane ? and the idea of kernel ?

Upvotes: 1

Views: 1390

Answers (1)

lejlot
lejlot

Reputation: 66825

While the first part of @JeffHeaton 's answer is nice, the rest does not answet the OP question, so some further details:

formula

Why is hyperplane equationw^Tx+b=0 ? First, you have to be aware what w^Tx=<w,x> does with x. it basically projects x (vector starting at (0,0)) at w (which is a vector starting in (0,0), so as the result, you can either get a positive number (angle between x and w is less than 90 degrees), equal to 0 (they are perpendicular) or negative (angle is bigger than 90 degrees). So you can see, that it equals 0 iff these two objects are perpendicular, so the only thing left is distance from the origin (0,0) which is done by adding a constant b. From the geometrical point of view w is so called "normal to the hyperplane", simply - a vector perpendicular to the hyperplane. So, if you now calculate <w,x> and get 0, x is perpendicular to the w, which is perpendicular to the hyperplane, so x lies in the hyperplane.

kernel

kernel, is nothing more than this scalar product <w,x> written in the previous formula. The only reason for writing K(x,y) instead, is that it assumes, that you have some "magical"mapping into some different space phi. In other words, if you have some function phi which rearanges your points in a way, that they are easier to classify, than you can train a linear svm on phi(X), L instead of X,L (where L are correct labels). The problem is, that it is very hard to find a good phi. In practise, we simply choose a bir random or arbitrary phi, which simply maps points to higher dimension. This is a known mathematical fact, that in higher dimension, points are easier to separate. In particular, if you have N points x_1, ..., x_N, you can always select such phi, that phi(x_i)=[0 0 0 ... 1 ... 0], where this 1 appears on the ith position. Unfortunately suchphi(X) are expensive to calculate, and so we use a kernel functions instead, which is defined as K(x,y)=<phi(x), phi(y)>. So we do not have to know the explicit phi value, but instead, we just need to know the scalar product between images of points through phi. And this is exactly what kernels do, they denote scalar products in some differenct spaces. In particular, RBF kernel maps each point into ... FUNCTION (in fact, a gaussian distribution). So the phi(x) has an infinite dimension and cannot be efficiently calculated, but scalar product between two functions is just an integral of their multiplication, which is quite easy object.

Upvotes: 2

Related Questions