Reputation: 51
I try to make a Dendrogram Associated for the Agglomerative Hierarchical Clustering and I need the Distance Matrix. I started with:
import numpy as np
import pandas as pd
from scipy import ndimage
from scipy.cluster import hierarchy
from scipy.spatial import distance_matrix
from matplotlib import pyplot as plt
from sklearn import manifold, datasets
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets.samples_generator import make_blobs
%matplotlib inline
X1, y1 = make_blobs(n_samples=50, centers=[[4,4], [-2, -1], [1, 1], [10,4]], cluster_std=0.9)
plt.scatter(X1[:, 0], X1[:, 1], marker='o')
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)
# Create a figure of size 6 inches by 4 inches.
plt.figure(figsize=(6,4))
# These two lines of code are used to scale the data points down,
# Or else the data points will be scattered very far apart.
# Create a minimum and maximum range of X1.
x_min, x_max = np.min(X1, axis=0), np.max(X1, axis=0)
# Get the average distance for X1.
X1 = (X1 - x_min) / (x_max - x_min)
# This loop displays all of the datapoints.
for i in range(X1.shape[0]):
# Replace the data points with their respective cluster value
# (ex. 0) and is color coded with a colormap (plt.cm.spectral)
plt.text(X1[i, 0], X1[i, 1], str(y1[i]),
color=plt.cm.nipy_spectral(agglom.labels_[i] / 10.),
fontdict={'weight': 'bold', 'size': 9})
# Remove the x ticks, y ticks, x and y axis
plt.xticks([])
plt.yticks([])
#plt.axis('off')
# Display the plot of the original data before clustering
plt.scatter(X1[:, 0], X1[:, 1], marker='.')
# Display the plot
plt.show()
dist_matrix = distance_matrix(X1,X1)
print(dist_matrix)
and I get an error when I write this:
Z = hierarchy.linkage(dist_matrix, 'complete')
/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:1: ClusterWarning: scipy.cluster: The symmetric non-negative hollow observation matrix looks suspiciously like an uncondensed distance matrix """Entry point for launching an IPython kernel.
First of all, what does that mean and how can I solve it? Thanks
Upvotes: 5
Views: 11482
Reputation: 650
scipy.cluster.heirarchy.linkage
expects a condensed distance matrix, not a squareform/uncondensed distance matrix. You've calculated a squareform distance matrix, and need to convert it to a condensed form. I suggest using scipy.spatial.distance.squareform
. The following snipped reproduces your functionality (I've removed the plotting for brevity) without a warning.
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs
from scipy.spatial import distance_matrix
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform
X1, y1 = make_blobs(n_samples=50, centers=[[4,4],
[-2, -1],
[1, 1],
[10,4]], cluster_std=0.9)
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)
dist_matrix = distance_matrix(X1,X1)
print(dist_matrix.shape)
condensed_dist_matrix = squareform(dist_matrix)
print(condensed_dist_matrix.shape)
Z = hierarchy.linkage(condensed_dist_matrix, 'complete')
Upvotes: 7
Reputation: 119
It means X1 is too close to X1.T in
agglom.fit(X1,y1)
You can add below code in the header to ignore it!
from scipy.cluster.hierarchy import ClusterWarning
from warnings import simplefilter
simplefilter("ignore", ClusterWarning)
Upvotes: 4