Reputation: 577
I want to pass my own distance matrix (row linkages) to seaborn clustermap.
There are already some posts on this like
Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
But they all point to
Which takes the clustering metric and method as arguments.
scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean', optimal_ordering=False)
The input y may be either a 1d condensed distance matrix or a 2d array of observation vectors
What I dont get is this:
My distance matrix is already based on a certain metric and method, why would I want to recalculate this in scipy hierarchy linkage ?
Is there an option where it purely uses my distances and creates the linkages?
Upvotes: 4
Views: 1483
Reputation: 198
For posterity, here is a complete method of how to do this, as @WarrenWeckesser in the comments and @SibbsGambling in the linked answer leave out some details.
Suppose distMatrix
is your matrix of distances (don't have to be Euclidean), with entry in row i
and column j
representing the distance between the i
th and j
th objects. Then:
# import packages
from scipy.cluster import hierarchy
import scipy.spatial.distance as ssd
import seaborn as sns
# define distance array as in linked answer
distArray = ssd.squareform(distMatrix)
# define linkage object
distLinkage = hierarchy.linkage(distArray)
# make clustermap
sns.clustermap(distMatrix, row_linkage=distLinkage, col_linkage=distLinkage)
Note that when creating the clustermap
, you still have to reference the original matrix. If you want to use a different clustering method, such as method='ward'
, include that option when defining distLinkage
.
Upvotes: 4