user2018858
user2018858

Reputation: 31

Memory Error in pyplot of large data

I am new to python and to programming. I am trying to plot down an orientation map using python. I have a large number of points (about 1,200,000) on a plane and each of them belong to a cluster. Each cluster is supposed to be of different color. What I am doing currently is assigning a color to each cluster and drawing a filled circle at each point. I tried to do it in parts by creating plots for different segments and using blend to combine them. This is the code for the part: (sn is the total number of points, label is the cluster array of cluster number and xcoor and ycoor are the coordinates of the point)

pylab.xlim([0,250])
pylab.ylim([0,100])
plt.savefig("HK pickle.png")
for l in range (1, 20):
    for j in range(int((float(sn)/80)*(l-1)), int((float(sn)/80)*(l))):
        overlay = Image.open("HK pickle.png")
        c = label[j] % 8
        if c == 0:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0.5, 0, 0))
        elif c == 1:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (1, 0, 0))
        elif c == 2:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0, 0.5, 0))
        elif c == 3:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0, 1, 0))
        elif c == 4:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0, 0, 0.5))
        elif c == 5:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0, 0 ,1))
        elif c == 6:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0.5, 0.5 ,0))
        elif c == 7:
            circle1 = plt.Circle((float(xcoor[j]), float(ycoor[j])), 0.05, color = (0.5, 0 ,0.5))
        fig = plt.gcf()
        fig.gca().add_artist(circle1)
        del circle1
    plt.savefig("HK pick.png")
    del fig
    back = Image.open("HK pick.png")
    comp = Image.blend(back, overlay, 0.5)
    comp.save("HK pickle.png", "PNG")
    del comp
pylab.xlim([0,250])
pylab.ylim([0,100])
plt.savefig("HK plots.png")

However, this leads to the following error:

    fig.gca().add_artist(circle1)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 1404, in add_artist
    self.artists.append(a)
MemoryError

The error arises at l = 11. I kept checking the task manager in parallel and it still had almost 3GB free memory when the MemoryError showed up. Please help me with this.

I am new to this and still don't know if the information I've given is enough. please let me know if you need any more information

Upvotes: 1

Views: 1642

Answers (2)

tacaswell
tacaswell

Reputation: 87416

You might do better with scatter and the keyword rasterized=True, which will flatten all of the vector graphics down to a raster image (which will take less memory).

Something like:

colors_lst = [ ... your tuples ...]
color = map(lambda x: colors_lst[x % 8], labels)
ax.scatter(xcoord, ycoord, c = colors, rasterized=True)

I think will replace most of your script.

scatter documentation

Upvotes: 1

danodonovan
danodonovan

Reputation: 20353

If you're on a 32 bit OS or running 32 bit python, you will not be able to efficiently work with large data sets (installing 64 bit python, numpy, matplotlib etc may fix this).

However, I would suggest first trying to draw your picture at a lower resolution and seeing if that works for you (the results may be good enough). For example, I would first replace the j iterator for j in range(int((float(sn)/80)*(l-1)), int((float(sn)/80)*(l))): with something like

for j in np.linspace(int((float(sn)/80)*(l-1)), int((float(sn)/80)*(l), num=20):
    j = int(j)

which will give you a range of 20 j values within your limits, but not at each integer value. Note you will need to cast j into an int as it is likely to be a np.float!

Other style remarks are less useful at this point, but in general you needn't del often - python has a very good garbage collector that does this for you. You could also set your limits outside of the iterators - this may make debugging more straightforward:

start_j = int((float(sn)/80)*(l-1)))
end_j = int((float(sn)/80)*(l))
for j in np.linspace(start_j, end_j, num=20):
    etc.

Upvotes: 0

Related Questions