anonuser0428
anonuser0428

Reputation: 12343

Scatter plot segregate clusters by color matplotlib python

I am working on a clustering algorithm and need for all points in my scatter plot that belong to the same cluster to be marked the same color. I have a list which indicates for each point which cluster that point belongs to, marked with an integer 0...k where k is the number of clusters. I would like to know how to map this list to colors (preferably as many colors as the number of clusters in the clustering algorithm which is known beforehand). I am working with matplotlib in python and am completely lost as to how to solve this problem.

plt.scatter([item[0] for item in dataset],[item[1] for item in dataset],color='b')
plt.scatter([item[0] for item in centroids_list],[item[1] for item in centroids_list],color='r)

plt.show()

Right now this is all I have wherein the cluster points are indicated in blue and the centroids in red. I would like to leave the centroids red and only change the color of the points in the dataset such that points of the same cluster have the same color. I am lost in the sea that is the matplotlib library and would really appreciate any help.

Thanks in advance!

Upvotes: 2

Views: 11501

Answers (3)

alexey
alexey

Reputation: 706

if you use numpy arrays you can simplify slicing and if you pass to color param clusters label it should work fine:

plt.scatter(item[:, 0], item[:, 1], color=clusters)
plt.scatter(centroids_list[:, 0], centroids_list[:, 1], s=70, c='r')

and you can use meshgrid together with plt.imshow to add colorfull background as in examle here

Upvotes: 1

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77464

If you have numpy arrays, you should be able to use dataset[:,0] to access the first column much more efficiently.

I found scatter to behave odd sometimes (at least in ipython notebook), but the plot function can do this, too.

i = 0
markers = matplotlib.lines.Line2D.markers.keys()
colors = list("bgrcmyk")
for cluster in clusters:
  marker, color = markers[i % len(markers)], colors[i % len(colors)]
  plt.plot(cluster[:,0],cluster[:,1],marker+color)
  i += 1

Upvotes: 0

willy
willy

Reputation: 1490

See the color parameter at the pyplot.scatter documentation.

Basically, you need to separate your data up into clusters, and then call pyplot.scatter in a loop, each with a different item as the color parameter.

You can use vq from scipy.cluster to assign your data to clusters using your centroids, like so:

    assignments = vq( dataset, centroids_list )[0]
    clusters = [[] for i in range( len( assignments ) )
    for item, clustNum in zip( dataset, assignments ):
        clusters[clustNum].append( item )

At least this is how I've done it before if I'm remembering correctly. From there it's just defining a function to return a random color, and then:

    for cluster in clusters:
        plt.scatter([item[0] for item in cluster],[item[1] for item in cluster],color=randomColor() ) 

Upvotes: 1

Related Questions