Reputation: 1607
I previously asked for an explanation of linearly separable data. Still reading Mitchell's Machine Learning book, I have some trouble understanding why exactly the perceptron rule only works for linearly separable data?
Mitchell defines a perceptron as follows:
That is, it is y is 1 or -1 if the sum of the weighted inputs exceeds some threshold.
Now, the problem is to determine a weight vector that causes the perceptron to produce the correct output (1 or -1) for each of the given training examples. One way of achieving this is through the perceptron rule:
One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modify- ing the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight wi associated with input xi according to the rule:
So, my question is: Why does this only work with linearly separable data? Thanks.
Upvotes: 3
Views: 2062
Reputation: 6079
Because the dot product of w
and x
is a linear combination of x
s, and you, in fact, split your data into 2 classes using a hyperplane a_1 x_1 + … + a_n x_n > 0
Consider a 2D example: X = (x, y)
and W = (a, b)
then X * W = a*x + b*y
. sgn
returns 1 if its argument is greater than 0, that is, for class #1 you have a*x + b*y > 0
, which is equivalent to y > -a/b x
(assuming b != 0). And this equation is linear and divides a 2D plane into 2 parts.
Upvotes: 6