Rain Lee
Rain Lee

Reputation: 531

python compute distance matrix from dictionary data

I want to compute a distance matrix from a dictionary data like the following:

y = {"a": ndarray1, "b": ndarry2, "c": ndarry3}

The value of each key ("a", "b", "c") is a np.ndarry with different size. And I have a dist() function that can compute the distance between y["a"] and y["b"] through dist(y["a"], y["b"]).

So that the resulting distance matrix would be:

+----------------------------------------------------------------+
|                a        b                        c             |
+----------------------------------------------------------------+
| a  | 0        mydist(ndarrya1, ndarray)  mydist(ndarray1, ndarray3) |
| b  |          0                        mydist(ndarray2, ndarray3) |
| c  |                                   0                        |
+----------------------------------------------------------------+

I have tried scipy.spatial.distance.pdist with pdist(y, mydist), but got an error saying that:

[X] = _copy_arrays_if_base_present([_convert_to_double(X)])
  File "/usr/local/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 113, in _convert_to_double
X = X.astype(np.double)
TypeError: float() argument must be a string or a number

Can anyone tell me how to implement this pdist by myself? I want to use the pdist result for further hierarchical clustering.

Upvotes: 1

Views: 1185

Answers (1)

CT Zhu
CT Zhu

Reputation: 54340

The first part of your question is quite clear. The second part I don't know what are you asking. Why do you need to re-implement scipy.spatial.distance.pdist, I thought you already have a dist() function to calculate the pairwise distance.

To get pairwise distance, when you already have a dist() function to calculate it:

In [69]:
D={'a':some_value,'b':some_value,'c':some_value}
In [70]:
import itertools
In [71]:
list(itertools.combinations(D,2))
Out[71]:
[('a', 'c'), ('a', 'b'), ('c', 'b')]

In [72]: #this is what you need:
[dist(*map(D.get, item)) for item in itertools.combinations(D,2)]

Upvotes: 1

Related Questions