Someone Someoneelse
Someone Someoneelse

Reputation: 485

Can anyone provide me with some clustering examples?

I am having a hard time understanding what scipy.cluster.vq really does!!

On Wikipedia it says Clustering can be used to divide a digital image into distinct regions for border detection or object recognition.
on other sites and books it says we can use clustering methods for clustering images for finding groups of similar images.
AS i am interested in image processing ,I really need to fully understand what clustering is .

So
Can anyone show me simple examples about using scipy.cluster.vq with images??

Upvotes: 1

Views: 1703

Answers (2)

John Vinyard
John Vinyard

Reputation: 13485

The kind of clustering performed by scipy.cluster.vq is definitely of the latter (groups of similar images) variety.

The only clustering algorithm implemented in scipy.cluster.vq is the K-Means algorithm, which typically treats input data as points in n-dimensional euclidean space, and attempts to divide that space so that new, incoming data can be summarized by saying "example x is most like centroid y". Centroids can be thought of as prototypical examples of the input data. Vector quantization leads to concise, or compressed representations because, instead of remembering all 100 pixels of each new image we see, we can remember a single integer which points at the prototypical example that the new image is most like.

If you had many small grayscale images:

>>> import numpy as np
>>> images = np.random.random_sample((100,10,10))

So, we've got 100 10x10 pixel images. Let's assume they already all have similar brightness and contrast. The scipy kmeans implementation expects flat vectors:

>>> images = images.reshape((100,100))
>>> images.shape
(100,100)

Now, let's train the K-Means algorithm so that any new incoming image can be assigned to one of 10 clusters:

>>> from scipy.cluster.vq import kmeans, vq
>>> codebook,distortion = kmeans(images,10)

Finally, let's say we have five new images we'd like to assign to one of the ten clusters:

>>> newimages = np.random.random_samples((5,10,10))
>>> clusters = vq(newimages.reshape((5,100)),codebook)

clusters will contain the integer index of the best matching centroid for each of the five examples.

This is kind of a toy example, and won't yield great results unless the objects of interest in the images you're working with are all centered. Since objects of interest might appear anywhere in larger images, it's typical to learn centroids for smaller image "patches", and then convolve them (compare them at many different locations) with larger images to promote translation-invariance.

Upvotes: 3

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

The second is what clustering is: group objects that are somewhat similar (and that could be images). Clustering is not a pure imaging technique.

When processing a single image, it can for example be applied to colors. This is a quite good approach for reducing the number of colors in an image. If you cluster by colors and pixel coordinates, you can also use it for image segmentation, as it will group pixels that have a similar color and are close to each other. But this is an application domain of clustering, not pure clustering.

Upvotes: 0

Related Questions