Correct implementation of weighted K-Nearest Neighbors

Question

From what I understood, the classical KNN algorithm works like this (for discrete data):

Let x be the point you want to classify
Let dist(a,b) be the Euclidean distance between points a and b
Iterate through the training set points pᵢ, taking the distances dist(pᵢ,x)
Classify x as the most frequent class between the K points closest (according to dist) to x.

How would I introduce weights on this classic KNN? I read that more importance should be given to nearer points, and I read this, but couldn't understand how this would apply to discrete data.

For me, first of all, using argmax doesn't make any sense, and if the weight acts increasing the distance, than it would make the distance worse. Sorry if I'm talking nonsense.

Prune · Accepted Answer

Consider a simple example with three classifications (red green blue) and the six nearest neighbors denoted by R, G, B. I'll make this linear to simplify visualization and arithmetic

R B G x G R R

The points listed with distance are

class dist
  R     3
  B     2
  G     1
  G     1
  R     2
  R     3

Thus, if we're using unweighted nearest neighbours, the simple "voting" algorithm is 3-2-1 in favor of Red. However, with the weighted influences, we have ...

red_total   = 1/3^2 + 1/2^2 + 1/3^2  = 1/4 + 2/9 ~=  .47
blue_total  = 1/2^2 ..............................=  .25
green_total = 1/1^2 + 1/1^2 ......................= 2.00

... and x winds up as Green due to proximity.

That lower-delta function is merely the classification function; in this simple example, it returns red | green | blue. In a more complex example, ... well, I'll leave that to later tutorials.

Correct implementation of weighted K-Nearest Neighbors

Answers (2)

Related Questions