AKP
AKP

Reputation: 51

warning: uncondensed distance matrix in python

I try to make a Dendrogram Associated for the Agglomerative Hierarchical Clustering and I need the Distance Matrix. I started with:

import numpy as np 
import pandas as pd
from scipy import ndimage 
from scipy.cluster import hierarchy 
from scipy.spatial import distance_matrix 
from matplotlib import pyplot as plt 
from sklearn import manifold, datasets 
from sklearn.cluster import AgglomerativeClustering 
from sklearn.datasets.samples_generator import make_blobs 
%matplotlib inline
X1, y1 = make_blobs(n_samples=50, centers=[[4,4], [-2, -1], [1, 1], [10,4]], cluster_std=0.9)
plt.scatter(X1[:, 0], X1[:, 1], marker='o') 
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)
# Create a figure of size 6 inches by 4 inches.
plt.figure(figsize=(6,4))

# These two lines of code are used to scale the data points down,
# Or else the data points will be scattered very far apart.

# Create a minimum and maximum range of X1.
x_min, x_max = np.min(X1, axis=0), np.max(X1, axis=0)

# Get the average distance for X1.
X1 = (X1 - x_min) / (x_max - x_min)

# This loop displays all of the datapoints.
for i in range(X1.shape[0]):
    # Replace the data points with their respective cluster value 
    # (ex. 0) and is color coded with a colormap (plt.cm.spectral)
    plt.text(X1[i, 0], X1[i, 1], str(y1[i]),
             color=plt.cm.nipy_spectral(agglom.labels_[i] / 10.),
             fontdict={'weight': 'bold', 'size': 9})

# Remove the x ticks, y ticks, x and y axis
plt.xticks([])
plt.yticks([])
#plt.axis('off')



# Display the plot of the original data before clustering
plt.scatter(X1[:, 0], X1[:, 1], marker='.')
# Display the plot
plt.show()
dist_matrix = distance_matrix(X1,X1) 
print(dist_matrix)

and I get an error when I write this:

Z = hierarchy.linkage(dist_matrix, 'complete')

/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:1: ClusterWarning: scipy.cluster: The symmetric non-negative hollow observation matrix looks suspiciously like an uncondensed distance matrix """Entry point for launching an IPython kernel.

First of all, what does that mean and how can I solve it? Thanks

Upvotes: 5

Views: 11482

Answers (2)

ahagen
ahagen

Reputation: 650

scipy.cluster.heirarchy.linkage expects a condensed distance matrix, not a squareform/uncondensed distance matrix. You've calculated a squareform distance matrix, and need to convert it to a condensed form. I suggest using scipy.spatial.distance.squareform. The following snipped reproduces your functionality (I've removed the plotting for brevity) without a warning.

from sklearn.cluster import AgglomerativeClustering 
from sklearn.datasets import make_blobs
from scipy.spatial import distance_matrix
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform

X1, y1 = make_blobs(n_samples=50, centers=[[4,4],
                                           [-2, -1],
                                           [1, 1],
                                           [10,4]], cluster_std=0.9)

agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)

dist_matrix = distance_matrix(X1,X1)
print(dist_matrix.shape)
condensed_dist_matrix = squareform(dist_matrix)
print(condensed_dist_matrix.shape)
Z = hierarchy.linkage(condensed_dist_matrix, 'complete')

Upvotes: 7

blue
blue

Reputation: 119

It means X1 is too close to X1.T in

agglom.fit(X1,y1)

You can add below code in the header to ignore it!

from scipy.cluster.hierarchy import ClusterWarning
from warnings import simplefilter
simplefilter("ignore", ClusterWarning)

Upvotes: 4

Related Questions