Reputation: 99
I am trying to make a hexbin plot in python of a rather large data set. The two arrays containing the data are 35 million entries long. However, they only take up 1.5 GB of memory and I have more than 4 GB of memory available. Hexbin fails with a memory error.
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "plotmodel.py", line 20, in <module>
plt.hexbin(d,t,m, bins='log', gridsize=20, xscale='log', lw=1, edgecolors='black', alpha=0.8, cmap=plt.cm.jet)
File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2142, in hexbin
ret = ax.hexbin(x, y, C, gridsize, bins, xscale, yscale, extent, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, reduce_C_function, mincnt, marginals, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/matplotlib/axes.py", line 6035, in hexbin
iy2 = np.floor(y).astype(int)
MemoryError
Are there any fundamental reasons why hexbin doesn't work on large data sets or is the error due to hardware limitations?
Upvotes: 0
Views: 410
Reputation: 87426
It looks like hexbin
makes several np.ndarray
s that are the same size as the input data (by rough count I got to 8!). There is nothing fundemental about this, it is written this way, its done to get the vectorization speed up from numpy
.
I would suggest pulling the hexbin
out of mpl, split it into three parts, the first part which sets up details of the hexarray, the second which adds data to an accum
array (so you can work on you data in chunks), and the third part which takes accum
and the output of the first part to actually make the plot.
Upvotes: 2