Intuition behind Kernelized perceptron

Question

I understand the derivation of the kernelized perceptron function, but I'm trying to figure out the intuition behind the final formula

f(X) = sum_i (alpha_i*y_i*K(X,x_i))

Where (x_i,y_i) are all the samples in the training data, alpha_i is the number of times we've made a mistake on that sample, and X is the sample we're trying to predict (during training or otherwise). Now, I understand why the kernel function is considered to be a measure of similarity (since it's a dot product in a higher dimensional space), but what I don't get is how this formula comes together.

My original attempt was that we're trying to predict a sample based on how similar it is to the other samples - and multiply it by y_i so that it contributes the correct sign (points that are closer are better indicators of the label than points that are farther). But why should a sample that we've made several mistakes on contribute more?

tl;dr: In a Kernelized perceptron, why should a sample that we've made several mistakes on contribute more to the prediction than ones we haven't made mistakes on?

Pedrom · Accepted Answer

My original attempt was that we're trying to predict a sample based on how similar it is to the other samples - and multiply it by y_i so that it contributes the correct sign (points that are closer are better indicators of the label than points that are farther).

This is pretty much what's going on. Although the idea is if alpha_i*y_i*K(X,x_i) already is well classified, then you don't need to update it further.

But if the point is misclassified we need to update it. The best way would be in the opposite direction right? that's if the result is negative we should be adding a possitive quantity (y_i). If the result is possitive (and it is missclassified) then we want to sum a negative value (y_i again).

As you can see, y_i already give us the right update direction and hence we use a misclassification counter to give a magnitude to that update.

Intuition behind Kernelized perceptron

Answers (1)

Related Questions