Reputation: 2837
A geometric margin is simply the euclidean distance between a certain x (data point) to the hyperlane.
What is the intuitive explanation to what a functional margin is?
Note: I realize that a similar question has been asked here: How to understand the functional margin in SVM ?
However, the answer given there explains the equation, but not its meaning (as I understood it).
Upvotes: 21
Views: 23970
Reputation: 518
Not going into unnecessary complications about this concept, but in the most simple terms here is how one can think of and relate functional and geometric margin.
Think of functional margin -- represented as 𝛾̂, as a measure of correctness of a classification for a data unit. For a data unit x with parameters w and b and given class y = 1, the functional margin is 1 only when y and (wx + b) are both of the same sign - which is to say are correctly classified.
But we do not just rely on if we are correct or not in this classification. We need to know how correct we are, or what is the degree of confidence that we have in this classification. For this we need a different measure, and this is called geometric margin -- represented as 𝛾, and it can be expressed as below:
𝛾 = 𝛾̂ / ||𝑤||
So, geometric margin 𝛾 is a scaled version of functional margin 𝛾̂. If ||w|| == 1, then the geometric margin is same as functional margin - which is to say we are as confident in the correctness of this classification as we are correct in classifying a data unit to a particular class.
This scaling by ||w|| gives us the measure of confidence in our correctness. And we always try to maximise this confidence in our correctness.
Functional margin is like a binary valued or boolean valued variable: if we have correctly classified a particular data unit or not. So, this cannot be maximised. However, geometric margin for the same data unit gives a magnitude to our confidence, and tells us how correct we are.So, this we can maximise.
And we aim for larger margin through the means of geometric margin because the wider the margin the more is the confidence in our classification.
As an analogy, say a wider road (larger margin => higher geometric margin) gives higher confidence to drive must faster as it lessens the chance of hitting any pedestrian or trees (our data units in the training set), but on the narrower road (smaller margin => smaller geometric margin), one has to be a lot more cautious to not hit (lesser confidence) any pedestrian or trees. So, we always desire wider roads (larger margin), and that's why we aim to maximise it by maximising our geometric margin.
Upvotes: 0
Reputation: 7130
The functional margin represents the correctness and confidence of the prediction if the magnitude of the vector(w^T) orthogonal to the hyperplane has a constant value all the time.
By correctness, the functional margin should always be positive, since if wx + b is negative, then y is -1 and if wx + b is positive, y is 1. If the functional margin is negative then the sample should be divided into the wrong group.
By confidence, the functional margin can change due to two reasons: 1) the sample(y_i and x_i) changes or 2) the vector(w^T) orthogonal to the hyperplane is scaled (by scaling w and b). If the vector(w^T) orthogonal to the hyperplane remains the same all the time, no matter how large its magnitude is, we can determine how confident the point is grouped into the right side. The larger that functional margin, the more confident we can say the point is classified correctly.
But if the functional margin is defined without keeping the magnitude of the vector(w^T) orthogonal to the hyperplane the same, then we define the geometric margin as mentioned above. The functional margin is normalized by the magnitude of w to get the geometric margin of a training example. In this constraint, the value of the geometric margin results only from the samples and not from the scaling of the vector(w^T) orthogonal to the hyperplane.
The geometric margin is invariant to the rescaling of the parameter, which is the only difference between geometric margin and functional margin.
EDIT:
The introduction of functional margin plays two roles: 1) intuit the maximization of geometric margin and 2) transform the geometric margin maximization issue to the minimization of the magnitude of the vector orthogonal to the hyperplane.
Since scaling the parameters w and b can result in nothing meaningful and the parameters are scaled in the same way as the functional margin, then if we can arbitrarily make the ||w|| to be 1(results in maximizing the geometric margin) we can also rescale the parameters to make them subject to the functional margin being 1(then minimize ||w||).
Upvotes: 10
Reputation: 77454
Check Andrew Ng's Lecture Notes from Lecture 3 on SVMs (notation changed to make it easier to type without mathjax/TeX on this site):
"Let’s formalize the notions of the functional and geometric margins . Given a training example
(x_i, y_i)
we define the functional margin of(w, b)
with respect to the training examplegamma_i = y_i( (w^T)x_i + b )
Note that if
y_i > 0
then for the functional margin to be large (i.e., for our prediction to be confident and correct), we need(w^T)x + b
to be a large positive number. Conversely, ify_i < 0
, then for the functional margin to be large, we need(w^T)x + b
to be a large negative number. Moreover, ify_i( (w^T)x_i + b) > 0
then our prediction on this example is correct. (Check this yourself.) Hence, a large functional margin represents a confident and a correct prediction."
Page 3 from the Lecture 3 PDF linked at the materials page linked above.
Upvotes: 3
Reputation: 3823
"A geometric margin is simply the euclidean distance between a certain x (data point) to the hyperlane. "
I don't think that is a proper definition for the geometric margin, and I believe that is what is confusing you. The geometric margin is just a scaled version of the functional margin.
You can think the functional margin, just as a testing function that will tell you whether a particular point is properly classified or not. And the geometric margin is functional margin scaled by ||w||
If you check the formula:
You can notice that independently of the label, the result would be positive for properly classified points (e.g sig(1*5)=1 and sig(-1*-5)=1) and negative otherwise. If you scale that by ||w|| then you will have the geometric margin.
Why does the geometric margin exists?
Well to maximize the margin you need more that just the sign, you need to have a notion of magnitude, the functional margin would give you a number but without a reference you can't tell if the point is actually far away or close to the decision plane. The geometric margin is telling you not only if the point is properly classified or not, but the magnitude of that distance in term of units of |w|
Upvotes: 42