Utpal Datta
Utpal Datta

Reputation: 446

usage of linkage function from scipy.cluster.hierarchy

I am trying to apply hierarchial cluster to pixel values of an image. This is to assign different areas of the image and extract segments with similar color. The problem area is to segment image by near colors only, not by shapes. I am trying like (assume the image is loaded as numpy array of shape(256,256,3), can't share the picture due to copyright issue:

from scipy.cluster.hierarchy import dendrogram, linkage
ppp=img.reshape(img.shape[0]*img.shape[1],img.shape[2])
Z = linkage(ppp, method = 'ward')
dendrogram(Z,leaf_rotation=90.,    leaf_font_size=8.,)

This is giving error:

MemoryError                               Traceback (most recent call last)
<ipython-input-87-39453b2f2da1> in <module>()
     14     ppp=img.reshape(img.shape[0]*img.shape[1],img.shape[2])
---> 15     Z = linkage(ppp, method = 'ward')
     16     dendrogram(Z,leaf_rotation=90.,    leaf_font_size=8.,)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\cluster\hierarchy.py in linkage(y, method, metric, optimal_ordering)
    706                          'matrix looks suspiciously like an uncondensed '
    707                          'distance matrix')
--> 708         y = distance.pdist(y, metric)
    709     else:
    710         raise ValueError("`y` must be 1 or 2 dimensional.")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, *args, **kwargs)
   1650     out = kwargs.pop("out", None)
   1651     if out is None:
-> 1652         dm = np.empty((m * (m - 1)) // 2, dtype=np.double)
   1653     else:
   1654         if out.shape != (m * (m - 1) // 2,):

Can you please help?

Upvotes: 0

Views: 1115

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114871

ppp has shape (65536, 3), so m in that error message is 65536. Internally, linkage creates an array of floating point values with size m*(m-1)//2 to hold all the pairwise distances. In your case this is 2147450880 elements. Each floating point element requires eight bytes, so the total size of the array is 17179607040 bytes. That's over 17 gigabytes. Presumably you don't have enough memory to allocate such an array.

Upvotes: 1

Related Questions