bjornasm
bjornasm

Reputation: 2318

Runs out of memory when plotting, Python

I'm retrieving a large number of data from a database, which I later plot using a scatterplot. However, I run out of memory, and the program aborts when I am using my full data. Just for the record it takes >30 minutes to run this program, and the length of the data list is about 20-30 million.

map = Basemap(projection='merc',
resolution = 'c', area_thresh = 10,
llcrnrlon=-180, llcrnrlat=-75,
urcrnrlon=180, urcrnrlat=82)

map.drawcoastlines(color='black')
# map.fillcontinents(color='#27ae60')
with lite.connect('database.db') as con:
    start = 1406851200
    end = 1409529600
    cur = con.cursor()
    cur.execute('SELECT latitude, longitude FROM plot WHERE unixtime >= {start} AND unixtime < {end}'.format(start = start, end = end))
    data = cur.fetchall()
    y,x = zip(*data)
    x,y = map(x,y)
    plt.scatter(x,y, s=0.05, alpha=0.7, color="#e74c3c", edgecolors='none')
    plt.savefig('Plot.pdf')
    plt.savefig('Plot.png')

I think my problem may be in the zip(*) function, but I really have no clue. I'm both interested in how I can preserve more memory by rewriting my existing code, and to split up the plotting process. My idea is to split the time period in half, then just do the same thing twice for the two time periods before saving the figure, however I am unsure on this will help me at all. If the problem is to actually plot it, I got no idea.

Upvotes: 3

Views: 1772

Answers (1)

titusjan
titusjan

Reputation: 5546

If you think the problem lies in the zip function, why not use a matplotlib array to massage your data into the right format? Something like this:

data = numpy.array(cur.fetchall())
lat = data[:,0]
lon = data[:,1]
x,y = map(lon, lat)

Also, your generated PDF will be very large and slow to render by the various PDF readers because it is a vectorized format by default. All your millions of data points will be stored as floats and rendered when the user opens the document. I recommend that you add the rasterized=True argument to your plt.scatter() call. This will save the result as a bitmap inside your PDF (see the docs here)

If this all doesn't help, I would investigate further by commenting out lines starting at the back. That is, first comment out plt.savefig('Plot.png') and see if the memory use goes down. If not, comment out the line before that, etc.

Upvotes: 2

Related Questions