Reputation: 1121
I am trying to get KNN with cosine distance but it looks like the metric parameter does not take cosine distance. Only the below metrics are available in http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html . Why is that ?
Metrics intended for real-valued vector spaces:
identifier class name args distance function
“euclidean” EuclideanDistance
sqrt(sum((x - y)^2))
“manhattan” ManhattanDistance
sum(|x - y|)
“chebyshev” ChebyshevDistance
sum(max(|x - y|))
“minkowski” MinkowskiDistance p sum(|x - y|^p)^(1/p)
“wminkowski” WMinkowskiDistance p, w sum(w * |x - y|^p)^(1/p)
“seuclidean” SEuclideanDistance V sqrt(sum((x - y)^2 / V))
“mahalanobis” MahalanobisDistance V or VI sqrt((x - y)' V^-1 (x - y))
Metrics intended for two-dimensional vector spaces:
identifier class name distance function
“haversine” HaversineDistance
2 arcsin(sqrt(sin^2(0.5*dx)
cos(x1)cos(x2)sin^2(0.5*dy)))
Upvotes: 4
Views: 2283
Reputation: 1121
Cosine distance isnt a proper distance in the sense that it doesnt satisfy the triangle inequality. Its an angle and doesnt represent a shortest distance in any sense per se. This is described well here - https://en.wikipedia.org/wiki/Cosine_similarity . For K-Means or any distance type similarity algorithm, satisfying the distance metric requirements (https://en.wikipedia.org/wiki/Metric_(mathematics)) is a necessary requirement.
Upvotes: 1