user3600497
user3600497

Reputation: 1661

How can I account for identical data points in a scatter plot?

I'm working with some data that has several identical data points. I would like to visualize the data in a scatter plot, but scatter plotting doesn't do a good job of showing the duplicates.

If I change the alpha value, then the identical data points become darker, which is nice, but not ideal.

Is there some way to map the color of a dot to how many times it occurs in the data set? What about size? How can I assign the size of the dot to how many times it occurs in the data set?

Upvotes: 2

Views: 2751

Answers (1)

Christoph
Christoph

Reputation: 1557

As it was pointed out, whether this makes sense depends a bit on your dataset. If you have reasonably discrete points and exact matches make sense, you can do something like this:

import numpy as np
import matplotlib.pyplot as plt
test_x=[2,3,4,1,2,4,2]
test_y=[1,2,1,3,1,1,1]   #  I am just generating some test x and y values. Use your data here
#Generate a list of unique points
points=list(set(zip(test_x,test_y))) 
#Generate a list of point counts
count=[len([x for x,y in zip(test_x,test_y) if x==p[0] and y==p[1]]) for p in points]
#Now for the plotting:
plot_x=[i[0] for i in points]
plot_y=[i[1] for i in points]
count=np.array(count)
plt.scatter(plot_x,plot_y,c=count,s=100*count**0.5,cmap='Spectral_r')
plt.colorbar()
plt.show()

Notice: You will need to adjust the radius (the value 100 in th s argument) according to your point density. I also used the square root of the count to scale it so that the point area is proportional to the counts.

Also note: If you have very dense points, it might be more appropriate to use a different kind of plot. Histograms for example (I personally like hexbin for 2d data) are a decent alternative in these cases.

Upvotes: 3

Related Questions