user248237
user248237

Reputation:

hierarchical clustering on correlations in Python scipy/numpy?

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have a numpy array X that contains the 100 x 9 matrix, how can I do this?

I tried using hcluster, based on this example:

Y=pdist(X, 'seuclidean')
Z=linkage(Y, 'single')
dendrogram(Z, color_threshold=0)

However, pdist is not what I want, since that's a euclidean distance. Any ideas?

thanks.

Upvotes: 13

Views: 14586

Answers (2)

nemo
nemo

Reputation: 379

I find it helpful to perform and visualize the hierarchical clustering using the seaborn clustermap (which uses underneath scipy for the clustering), after having used 'correlation' as a metric for pdist:

import seaborn as sns
from scipy.cluster.hierarchy import dendrogram
from scipy.spatial.distance import pdist, squareform

D = squareform(pdist(X.T, 'correlation'))
h = sns.clustermap(D, cmap='Reds')

You can also recover the corresponding linkage matrix and plot the dendrogram

Z = h.dendrogram_col.linkage    
dendrogram(Z, color_threshold=0)

Upvotes: 1

Justin Peel
Justin Peel

Reputation: 47072

Just change the metric to correlation so that the first line becomes:

Y=pdist(X, 'correlation')

However, I believe that the code can be simplified to just:

Z=linkage(X, 'single', 'correlation')
dendrogram(Z, color_threshold=0)

because linkage will take care of the pdist for you.

Upvotes: 14

Related Questions