Barbamento
Barbamento

Reputation: 33

Cosine distance more than 1

I'm using the distance.cosine function from the scipy.spatial python package. The problem is that my code returns me some values which are more than one. How is that possible?

My code is very simple but that's it:

for i in range(len(vec.split(","))):
    w1=vec.split(",")[i]
    vec_1=embedding.get_phrase_vector(w1)/np.linalg.norm(embedding.get_phrase_vector(w1))
        for j in range(len(vec.split(","))):
            w2=vec.split(",")[j]
            vec_2=embedding.get_phrase_vector(w2)/np.linalg.norm(embedding.get_phrase_vector(w2))
            matrix[i][j]=distance.cosine(vec_1,vec_2)

the two vector giving me problems are:

w1=[-0.29137    1.0635    -0.41772    0.10439    0.46724    0.28249
 -0.04234   -0.07716    0.31482   -0.31903   -0.15905    0.98593
  0.40408   -0.33376    0.11372    0.3485     0.28884    0.082693
  0.86843   -0.40946   -0.64101   -0.55062    0.15105   -0.16613
  0.88421    0.31586    0.0017234 -0.46789   -0.48933   -0.38975
 -0.48061   -0.086691   0.96367    0.13027    0.10883    0.13111
 -0.28605    0.32731    0.10249   -0.50631   -0.27578    0.053391
  0.45665   -0.11782    0.039271   0.27073    0.46305    0.66542
 -0.41682   -0.14791   -0.9136    -0.71694   -0.11963    0.095209
  0.21016    0.67604   -0.23403   -0.39308    0.34853   -0.91753
  0.73017    0.79334   -0.25474    0.51577   -1.0458    -0.59653
 -0.54101   -0.056912   0.01262    0.046881   0.0708     0.20313
 -0.34206   -0.62316   -0.48464    0.013741   0.057855  -0.29289
 -0.1755     0.059357  -0.01446    0.17238    0.065214   0.4437
  0.38186   -0.21588    0.55824    0.099175  -0.0094545  0.82726
 -0.4048    -0.47035   -0.16345    0.080469  -0.048781   0.091551
  0.67828   -0.56955   -0.024643  -0.51526  ]
w2=[-1.6486e-01  9.1997e-01  2.2737e-01 -4.9031e-01 -1.8082e-03 -3.3803e-01
  5.7221e-02  1.4601e-01  4.0202e-01 -2.8858e-01 -4.7495e-01 -5.6369e-01
  2.7037e-01  5.1702e-01 -1.1241e-01  1.8314e-01  2.2066e-01 -4.8606e-01
 -8.7284e-01 -6.2587e-02  4.3016e-02  2.3641e-01  5.9705e-01 -3.8640e-01
 -2.5194e-01  9.6862e-01 -4.3112e-01 -4.8370e-01 -1.1396e+00  9.2425e-02
 -1.1476e-01 -7.4291e-02 -6.2524e-02 -9.5122e-02 -2.2714e-01  8.8291e-01
  3.9978e-01  7.6631e-01 -6.7697e-01 -6.2829e-01 -1.1872e-01 -2.4492e-01
 -5.8893e-01 -8.5088e-01  1.1107e+00  4.2190e-01 -1.5072e+00 -1.9509e-01
 -2.6712e-01 -7.0801e-01  5.5075e-01 -4.6929e-02 -2.5203e-01  7.4411e-01
 -1.8325e-01 -1.4885e+00 -4.6393e-01 -1.0338e-01  2.3525e+00 -1.5421e-01
  3.9833e-01  1.5344e-02  8.0708e-02 -2.7373e-01  9.7057e-01 -1.9383e-02
  2.0899e-01 -6.4033e-01  9.2509e-01 -4.5371e-01 -7.0564e-01 -1.6033e-01
 -7.1761e-02  6.2856e-01  3.5732e-01  8.8802e-01 -6.9127e-01  4.9634e-02
 -9.3347e-01  6.5396e-01  3.7165e-01  5.8363e-02 -1.0152e+00  7.0845e-01
 -1.3542e+00 -3.6390e-01  2.5994e-01 -1.8260e-01 -9.8930e-01 -4.4699e-01
  8.5016e-01  9.4532e-02  3.7019e-01 -5.0354e-01 -1.2083e+00 -3.5776e-01
  2.3899e-01 -6.7904e-02  1.5072e+00  6.0889e-01]

and their disctance results 1.08074426763993081

Upvotes: 3

Views: 5516

Answers (1)

perl
perl

Reputation: 9941

If dot product of these vectors is negative, it's perfectly OK for cosine to return a value greater than 1 (see the formula used for cosine in the documentation)

For example:

from scipy.spatial.distance import cosine

cosine([1], [-1])

Output:

2.0

Upvotes: 4

Related Questions