How does matplotlib decide what to display when the data size is larger than axis

Question

I'm plotting a dataset where the size of the data arrays is larger than the size of the figure, even larger than the resolution of my screen. As shown in the example below, matplotlib does a remarkably good job rendering the data. This is just an example dataset. My real dataset is far more unpredictable. I have a concern there may be occasions when some important data is not shown. How does matplotlib decide what to show?

x = np.arange(0, 10000)
y = np.zeros(10000)
for i in range(0, 10000, 100):
    y[i] = x[i]
x_spikes = np.random.choice(x, size=10, replace=False)
y[x_spikes] = 10000 + x[x_spikes]   
plt.plot(x,y);

print(sorted(x_spikes))
[375, 2828, 3494, 6526, 6855, 6902, 6923, 7117, 7831, 9558]

ImportanceOfBeingErnest · Accepted Answer

The plt.plot command creates one or more Line2D objects. Those lines have a linewidth. The unit of the linewidth is points (the default being 1.5 points).

Independent of the pixel resolution all data is hence shown, no data is lost.

What can happen though is that if you make the linewidth very narrow, features may get lost due to antialiasing. To ensure that is not happening you may always use a linewidth which is at least ppi/dpi. I.e. 72/dpi in the matplotlib case. The default dpi is 100. So as long as the linewidth is greater or equal 0.72 points, all points are shown. (In Juypter often the default dpi is 72, hence 72/72==1, and a linewidth of 1 would be needed.)

All of this applies to lines. For bar plots (where the width is in data coordinates) this is different. Also images might not show all data - though imshow has the interpolation argument to allow to steer the interpolating behaviour.

How does matplotlib decide what to display when the data size is larger than axis

Answers (1)

Related Questions