SriK
SriK

Reputation: 1121

Why doesnt SKLearn's Distance Metric class have Cosine Distance?

I am trying to get KNN with cosine distance but it looks like the metric parameter does not take cosine distance. Only the below metrics are available in http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html . Why is that ?

Metrics intended for real-valued vector spaces: identifier class name args distance function “euclidean” EuclideanDistance
sqrt(sum((x - y)^2)) “manhattan” ManhattanDistance
sum(|x - y|) “chebyshev” ChebyshevDistance
sum(max(|x - y|)) “minkowski” MinkowskiDistance p sum(|x - y|^p)^(1/p) “wminkowski” WMinkowskiDistance p, w sum(w * |x - y|^p)^(1/p) “seuclidean” SEuclideanDistance V sqrt(sum((x - y)^2 / V)) “mahalanobis” MahalanobisDistance V or VI sqrt((x - y)' V^-1 (x - y)) Metrics intended for two-dimensional vector spaces: identifier class name distance function “haversine” HaversineDistance
2 arcsin(sqrt(sin^2(0.5*dx) cos(x1)cos(x2)sin^2(0.5*dy)))

Upvotes: 4

Views: 2283

Answers (1)

SriK
SriK

Reputation: 1121

Cosine distance isnt a proper distance in the sense that it doesnt satisfy the triangle inequality. Its an angle and doesnt represent a shortest distance in any sense per se. This is described well here - https://en.wikipedia.org/wiki/Cosine_similarity . For K-Means or any distance type similarity algorithm, satisfying the distance metric requirements (https://en.wikipedia.org/wiki/Metric_(mathematics)) is a necessary requirement.

Upvotes: 1

Related Questions