thebeancounter
thebeancounter

Reputation: 4839

python how to get proper distance value out of scipy condensed distance matrix

I am using python 2.7 with scipy to calculate a distance matrix for an array.

I don't get how to find the wanted distance values in the returned condensed matrix.

See example

from scipy.spatial.distance import pdist
import numpy as np

a = np.array([[1],[4],[0],[5]])
print a
print pdist(a)

will print

[ 3.  1.  4.  4.  1.  5.]

I found here that the ij entry in the condensed matrix should store the distance between the i and j entries where ithread wondering if they mean ij as i*j or str.join(i,j) e.g 1,2 -> 2 or 12.

I can't find a consistent way to know the wanted index.

see my example, you should expect that all of the distances from entry 0 to anywhere else will be stored in entry 0 if the first option is valid.

can anyone shed some light on how can i extract my wanted distance from entry x to entry y? which index am i looking for?

Thanks!

Upvotes: 0

Views: 821

Answers (1)

FlorianK
FlorianK

Reputation: 440

This vector is in condensed form. It enumerates all pairs of indices in a natural order (in your example 0,1 0,2 0,3 0,4 1,2 1,3 1,4 2,3 2,4 ) and yields the distance between the elements at these array entries.

There is also the squareform function, which transforms the condensed form into a square matrix form (and vice versa). The square matrix form is exactly what you expect, i.e. at entry ij (row i, column j), it stores the distance between the i-th and j-th entry. For example, if you add print squareform(d) at the end of you code, the output will be:

array([[ 0.,  3.,  1.,  4.],
       [ 3.,  0.,  4.,  1.],
       [ 1.,  4.,  0.,  5.],
       [ 4.,  1.,  5.,  0.]])

Upvotes: 2

Related Questions