Reputation: 33
I'm using the distance.cosine
function from the scipy.spatial
python package.
The problem is that my code returns me some values which are more than one.
How is that possible?
My code is very simple but that's it:
for i in range(len(vec.split(","))):
w1=vec.split(",")[i]
vec_1=embedding.get_phrase_vector(w1)/np.linalg.norm(embedding.get_phrase_vector(w1))
for j in range(len(vec.split(","))):
w2=vec.split(",")[j]
vec_2=embedding.get_phrase_vector(w2)/np.linalg.norm(embedding.get_phrase_vector(w2))
matrix[i][j]=distance.cosine(vec_1,vec_2)
the two vector giving me problems are:
w1=[-0.29137 1.0635 -0.41772 0.10439 0.46724 0.28249
-0.04234 -0.07716 0.31482 -0.31903 -0.15905 0.98593
0.40408 -0.33376 0.11372 0.3485 0.28884 0.082693
0.86843 -0.40946 -0.64101 -0.55062 0.15105 -0.16613
0.88421 0.31586 0.0017234 -0.46789 -0.48933 -0.38975
-0.48061 -0.086691 0.96367 0.13027 0.10883 0.13111
-0.28605 0.32731 0.10249 -0.50631 -0.27578 0.053391
0.45665 -0.11782 0.039271 0.27073 0.46305 0.66542
-0.41682 -0.14791 -0.9136 -0.71694 -0.11963 0.095209
0.21016 0.67604 -0.23403 -0.39308 0.34853 -0.91753
0.73017 0.79334 -0.25474 0.51577 -1.0458 -0.59653
-0.54101 -0.056912 0.01262 0.046881 0.0708 0.20313
-0.34206 -0.62316 -0.48464 0.013741 0.057855 -0.29289
-0.1755 0.059357 -0.01446 0.17238 0.065214 0.4437
0.38186 -0.21588 0.55824 0.099175 -0.0094545 0.82726
-0.4048 -0.47035 -0.16345 0.080469 -0.048781 0.091551
0.67828 -0.56955 -0.024643 -0.51526 ]
w2=[-1.6486e-01 9.1997e-01 2.2737e-01 -4.9031e-01 -1.8082e-03 -3.3803e-01
5.7221e-02 1.4601e-01 4.0202e-01 -2.8858e-01 -4.7495e-01 -5.6369e-01
2.7037e-01 5.1702e-01 -1.1241e-01 1.8314e-01 2.2066e-01 -4.8606e-01
-8.7284e-01 -6.2587e-02 4.3016e-02 2.3641e-01 5.9705e-01 -3.8640e-01
-2.5194e-01 9.6862e-01 -4.3112e-01 -4.8370e-01 -1.1396e+00 9.2425e-02
-1.1476e-01 -7.4291e-02 -6.2524e-02 -9.5122e-02 -2.2714e-01 8.8291e-01
3.9978e-01 7.6631e-01 -6.7697e-01 -6.2829e-01 -1.1872e-01 -2.4492e-01
-5.8893e-01 -8.5088e-01 1.1107e+00 4.2190e-01 -1.5072e+00 -1.9509e-01
-2.6712e-01 -7.0801e-01 5.5075e-01 -4.6929e-02 -2.5203e-01 7.4411e-01
-1.8325e-01 -1.4885e+00 -4.6393e-01 -1.0338e-01 2.3525e+00 -1.5421e-01
3.9833e-01 1.5344e-02 8.0708e-02 -2.7373e-01 9.7057e-01 -1.9383e-02
2.0899e-01 -6.4033e-01 9.2509e-01 -4.5371e-01 -7.0564e-01 -1.6033e-01
-7.1761e-02 6.2856e-01 3.5732e-01 8.8802e-01 -6.9127e-01 4.9634e-02
-9.3347e-01 6.5396e-01 3.7165e-01 5.8363e-02 -1.0152e+00 7.0845e-01
-1.3542e+00 -3.6390e-01 2.5994e-01 -1.8260e-01 -9.8930e-01 -4.4699e-01
8.5016e-01 9.4532e-02 3.7019e-01 -5.0354e-01 -1.2083e+00 -3.5776e-01
2.3899e-01 -6.7904e-02 1.5072e+00 6.0889e-01]
and their disctance results 1.08074426763993081
Upvotes: 3
Views: 5516
Reputation: 9941
If dot product of these vectors is negative, it's perfectly OK for cosine
to return a value greater than 1 (see the formula used for cosine
in the documentation)
For example:
from scipy.spatial.distance import cosine
cosine([1], [-1])
Output:
2.0
Upvotes: 4