Reputation: 489
I'm studying a bit of ML and I got stuck with some questions marks in my head, especially about the perceptron. So for example I'm asking:
We can see b the bias and w the weights as the coefficients of our linear separator right? This is valid only if we are in 2D where the linear separator is a line?
Our goal is to create a line in order to exactly divide data points into our training data right? Means, at the end of the learning phase the algorithm "discovered" the line (if we are in 2D) that best separate the two kinds of points. This happens because training data have the correct label y within and the algorithm can find the distance between the real label and the predicted one.
So moving to test phase, test points do not have a label with them so in my idea perceptron only recognize if the test point is above or below the returned line. This leads to the classification?
Someone use also this notation for the threshold activate function.
Is the same one as the other one that uses the error difference? If I'm not wrong this is used for -1/+1 classes. By the way, we are putting in relation the Yi label of my observation I and the output value of the perceptron?
Upvotes: 0
Views: 1002
Reputation: 1866
Since I can't add a comment, I'm using this answer to clarify rollotommasi question on deepideas answer.
The perceptron will only find the best solution if your training data is linearly separable, what means that the best solution is also the optimal solution.
So, if the line returned is "valid" for training data how can be valid also for test data?
As you said, to classify new data, the perceptron returns if this data is above or below the line, that said, it is expected that your training set represents the whole set in a way that new data (test data) won't diverge that much from the training set.
Imagine the exclusive-or problem for 2 features vectors, say you only consider the sign of each feature and whenever they are the same the output class is 1, otherwise the output class is 0. Your data can is divided in 4 quadrants that a single layer perceptron can't find the optimal solution.
Now consider your training set only has data from the first and second quadrant. For this training set, a single layer perceptron would find the optimal solution, dividing the quadrants by the y-axis. But then, when testing the model with the remaining data, it would guess all wrong.
Upvotes: 0
Reputation: 51
1) w and b are the coefficients of a linear separator, regardless of the dimension. w and b jointly represent the set of points where w^T x + b = 0. w has the same dimensionality as x, and b is always a scalar.
This set of points separates the space into two regions. In the case of 2 dimensions, the set of points corresponds to a line. In the case of 3 dimensions, it would correspond to a plane. In higher dimensions, you can't really visualize it, but it still works just the same. One refers to it as a hyperplane in general.
2) Partly correct. The test data is there to check how good your perceptron performs. You can't know how well it performs unless you know the true classes of the test data. What you usually do is to measure which percentage of the test data your perceptron classifies correctly (known as its accuracy). However, the test data does not influence the perceptron. It's only there to test it.
3) That's an unusual notation, you should provide some context, otherwise I can't tell you what it's supposed to represent.
Upvotes: 1