Reputation: 27095
With vector backends (pdf, eps), it's wasteful in terms of file size and rendering time to have points that are completely occluded by other points. How can these be removed?
Upvotes: 2
Views: 354
Reputation: 13465
That is an almost unfair question since that will depend on the marker size vs real coordinates which is difficult to calculate.
In any case perhaps an half solution will do for you. I'm thinking that if you calculate the distance between all points, when a pair is under a given tolerance you only use one of the points (instead of both). This won't be perfect but it might prove useful. A quick test with using this idea (I'm hoping I got the distance logic right):
import matplotlib.pyplot as plt
import scipy
x = np.random.normal(0,1,15000)
y = np.random.normal(0,1,15000)
tol = 0.01
xy = np.hstack((x[:,np.newaxis],y[:,np.newaxis]))
d = scipy.spatial.distance.cdist(xy,xy)
b = np.ones(x.shape,dtype='bool')
for i in range(d.shape[0]-1):
if d[i,i+1:].min() < tol and b[i]:
b[i+1+d[i,i+1:].argmin()] = False
x2 = x[b]
y2 = y[b]
f, (ax1, ax2) = plt.subplots(1, 2)
ax1.scatter(x,y,s=90)
ax1.set_xlim(-6,6)
ax1.set_ylim(-6,6)
ax2.scatter(x2,y2,s=90)
ax2.set_xlim(-6,6)
ax2.set_ylim(-6,6)
print('Before: ', x.shape,'\nNow: ',x2.shape)
plt.show()
, gives me this result:
Before: (15000,)
Now: (13004,)
Which represents a savings of about 2000 points in 15000. If you look closely you'll notice that is not perfect but I'm sure a little calibration in the tol
argument could improve the plot significantly.
Upvotes: 3