Christian Brinch
Christian Brinch

Reputation: 99

Matplotlib hexbin memory error

I am trying to make a hexbin plot in python of a rather large data set. The two arrays containing the data are 35 million entries long. However, they only take up 1.5 GB of memory and I have more than 4 GB of memory available. Hexbin fails with a memory error.

*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "plotmodel.py", line 20, in <module>
plt.hexbin(d,t,m, bins='log', gridsize=20, xscale='log', lw=1, edgecolors='black',    alpha=0.8, cmap=plt.cm.jet)
File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2142, in hexbin
ret = ax.hexbin(x, y, C, gridsize, bins, xscale, yscale, extent, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, reduce_C_function, mincnt, marginals, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/matplotlib/axes.py", line 6035, in hexbin
iy2 = np.floor(y).astype(int)
MemoryError

Are there any fundamental reasons why hexbin doesn't work on large data sets or is the error due to hardware limitations?

Upvotes: 0

Views: 410

Answers (1)

tacaswell
tacaswell

Reputation: 87426

It looks like hexbin makes several np.ndarrays that are the same size as the input data (by rough count I got to 8!). There is nothing fundemental about this, it is written this way, its done to get the vectorization speed up from numpy.

I would suggest pulling the hexbin out of mpl, split it into three parts, the first part which sets up details of the hexarray, the second which adds data to an accum array (so you can work on you data in chunks), and the third part which takes accum and the output of the first part to actually make the plot.

Upvotes: 2

Related Questions