Cosine distance between same vectors not equal 0

Question

I'm trying to retrieve the nearest neighbors of a vector from a list of vectors, using :

neigh = NearestNeighbors(metric='cosine')

neigh.fit(list)

From what I've read and witnessed if vector1 and vector2 have the same exact value across all dimension, the distance retrieved from these two vectors will be equal to 0. I'm using the kneighbors method to find the distance.

neigh.kneighbors(vector_input)

However, in some cases (not all cases) even if both vectors are equal, the distance retrieved is not equal to 0 but some tiny numbers like 2.34e-16.

len([i for i, j in zip(vector_from_list,vector_input) if i == j]) returns the dimension of the list meaning that each i-index element is equal to the i-index element of the other vector. Therefore, the vectors, if I'm not wrong, are totally equal.

The dtype for all vectors is np.float64

Is the method to find the distance not consistent ? Or did I overlook something (a parameter, for example) in scikit method ?

DrGeneral · Accepted Answer

I think that's an expected behavior.

If you want to use a condition if distance is equal to zero consider using numpy.isclose. For example,

import numpy as np

a = 2.34e-16
b = 1.7e-14 # both tiny values, almost zero
print(a==b) # prints False
print(np.isclose(a,b)) # prints True

You can set how close you want the value to be by setting other parameters of the function. See documentation for more.

Alternatively, you can also use python's built-in function math.isclose. See documentation. Example,

import math

a = 2.34e-16
b = 1.7e-14 # both tiny values, almost zero
print(math.isclose(a,b, abs_tol=1e-10)) # True

Cosine distance between same vectors not equal 0

Answers (1)

Related Questions