Reputation: 531
I want to compute a distance matrix from a dictionary data like the following:
y = {"a": ndarray1, "b": ndarry2, "c": ndarry3}
The value of each key ("a", "b", "c") is a np.ndarry with different size. And I have a dist()
function that can compute the distance between y["a"]
and y["b"]
through dist(y["a"], y["b"])
.
So that the resulting distance matrix would be:
+----------------------------------------------------------------+
| a b c |
+----------------------------------------------------------------+
| a | 0 mydist(ndarrya1, ndarray) mydist(ndarray1, ndarray3) |
| b | 0 mydist(ndarray2, ndarray3) |
| c | 0 |
+----------------------------------------------------------------+
I have tried scipy.spatial.distance.pdist
with pdist(y, mydist)
, but got an error saying that:
[X] = _copy_arrays_if_base_present([_convert_to_double(X)])
File "/usr/local/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 113, in _convert_to_double
X = X.astype(np.double)
TypeError: float() argument must be a string or a number
Can anyone tell me how to implement this pdist by myself? I want to use the pdist result for further hierarchical clustering.
Upvotes: 1
Views: 1185
Reputation: 54340
The first part of your question is quite clear. The second part I don't know what are you asking. Why do you need to re-implement scipy.spatial.distance.pdist
, I thought you already have a dist()
function to calculate the pairwise distance.
To get pairwise distance, when you already have a dist()
function to calculate it:
In [69]:
D={'a':some_value,'b':some_value,'c':some_value}
In [70]:
import itertools
In [71]:
list(itertools.combinations(D,2))
Out[71]:
[('a', 'c'), ('a', 'b'), ('c', 'b')]
In [72]: #this is what you need:
[dist(*map(D.get, item)) for item in itertools.combinations(D,2)]
Upvotes: 1