K.sabrina
K.sabrina

Reputation: 1

Why do we calculate the margin in SVM?

I'm learning SVM (Support Vector Machine) : there are several points that remain ambiguous : (linearly separable, primal case)

I know how to find the weigth w and the hyperplan equation, but if we can deduce the support vectors from it, why do we calculate the margin ? What do I need to calculate first ? In which case ? (Sorry for those mixed questions, but I'm really lost with it)

I saw in some exemples that the margin is caculated in this manner :

1 / ||w||

while in others, this way :

2 / ||w||

so what is the difference between those two cases ?

Thanks

Upvotes: 0

Views: 3348

Answers (2)

Bastian
Bastian

Reputation: 1593

The margin between the separating hyperplane and the class boundaries of an SVM is an essential feature of this algorithm.

See, you have two hyperplanes (1) w^tx+b>=1, if y=1 and (2) w^tx+b<=-1, if y=-1. This says that any vector with a label y=1 must lie ether on or behind the hyperplane (1). The same applies to the vectors with label y=-1 and hyperplane (2).

Note: If those requirements can be fulfilled, it implicitly means the dataset is linearly separatable. This makes sense because otherwise no such margin can be constructed.

So, what an SVM tries to find is a decision boundary which ist half-way between (1) and (2). Let's define this boundary as (3) w^tx+b=0. What you see here is that (1), (2) and (3) are parallel hyperplanes because they share the same parameters w and b. The parameters w holds the direction of those planes. Recall that a vector always has a direction and a magnitude/length.

The question is now: How can one calculate the hyperplane (3)? The equations (1) and (2) tell us that any vector with a label y=1 which is closest to (3) lies exactly on the hyperplane (1), hence (1) becomes w^tx+b=1 for such x. The similar applies for the closest vectors with a negative label and (2). Those vectors on the planes called 'support vectors' and the decision boundary (3) only depends on those, because one simply can subtract (2) from (1) for the support vectors and gets:

w^tx+b-w^tx+b=1-(-1) => wt^x-w^tx=2

Note: x for the two planes are different support vectors.

Now, we want to get the direction of w but ignoring it's length to get the shortest distance between (3) and the other planes. This distance is a perpendicular line segment from (3) to the others. To do so, one can divide by the length of w to get the norm vector which is perpendicular to (3), hence (wt^x-w^tx)/||w||=2/||w||. By ignoring the left hand site (it's equal) we see that the distance between the two planes is actually 2/||w||. This distance must be maximized.

Edit: As others state here, use Lagrange multipliers or the SMO algorithm to minimize the term 1/2 ||w||^2 s.t. y(w^tx+b)>=1 This is the convex form of the optimization problem for the primal svm.

Upvotes: 0

codeslord
codeslord

Reputation: 2368

The optimization objective of SVM is to reduce w, b in such a way that we have the maximum margin with the hyperplane.

Mathematically speaking, it is a nonlinear optimization task which is solved by KKT (Karush-Kunn-Tucker) conditions, using lagrange multipliers.

The following video explains this in simple terms for linearly seperable case

https://www.youtube.com/watch?v=1NxnPkZM9bc

Also how this is calculated is better explained here for both linear and primal cases.

https://www.csie.ntu.edu.tw/~cjlin/talks/rome.pdf

Upvotes: 1

Related Questions