Reputation: 1919
I have a very large sparse matrix (few million rows, 500 columns).
I have already cumputed a distance matrix of 5000X5000.
I need to use scipy.cluster.hierarchy.linkage
to get the clustering according to this matrix.
I know that linkage
accepts a custom function, but computing this distance matrix again is very time consuming.
How can I tell scipy to use the distances by the matrix?
I tried
dist = my_dist(X) # numpy array ndim = 2
linkage(X, metric=lambda x: dist[x,y])
but the x,y
passed are the values and not the indexes.
Upvotes: 1
Views: 2226
Reputation: 114881
You can pass the distance matrix to linkage
if you represent it as a "condensed" distance matrix. You can use scipy.spatial.squareform
to convert dist
to the condensed representation.
Something like this:
from scipy.spatial.distance import squareform
dist = my_dist(X)
condensed_dist = squareform(dist)
linkresult = linkage(condensed_dist)
Upvotes: 4