DeanLa
DeanLa

Reputation: 1919

scipy linkage with a given distance matrix

I have a very large sparse matrix (few million rows, 500 columns). I have already cumputed a distance matrix of 5000X5000. I need to use scipy.cluster.hierarchy.linkage to get the clustering according to this matrix. I know that linkage accepts a custom function, but computing this distance matrix again is very time consuming.
How can I tell scipy to use the distances by the matrix? I tried

dist = my_dist(X) # numpy array ndim = 2
linkage(X, metric=lambda x: dist[x,y])

but the x,y passed are the values and not the indexes.

Upvotes: 1

Views: 2226

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114881

You can pass the distance matrix to linkage if you represent it as a "condensed" distance matrix. You can use scipy.spatial.squareform to convert dist to the condensed representation.

Something like this:

from scipy.spatial.distance import squareform

dist = my_dist(X)
condensed_dist = squareform(dist)
linkresult = linkage(condensed_dist)

Upvotes: 4

Related Questions