Reputation: 6551
I am building a simple model of the Milky Way and one of the things I need to store is a 3D grid of mass densities.
The problem is that if I put a rectangular box around the galaxy, most of the grid cells are empty. Which leaves me saving a lot of useless zeros. So the naive array seems wasteful:
galaxy = [[[0 for k in xrange(1601)] for j in xrange(1601)] for i in xrange(253)]
# then fill in i,j,k values that are non-zero
I tried to build a sparse array using a dictionary:
for x in range(1601):
for y in range(1601):
for z in range (253):
galaxy[str(x) + "," + str(y) + "," + str(z)] = # whatever
But, (aside from being ugly) the strings that I was using for keys were taking up more memory than I was saving. I got OutOfMemoryError
s because (I calculated) the keys alone were taking a couple of gigs of memory.
At some point, I will want to increase the resolution of my model, and that will mean a much larger grid. Is there a more efficient way to store my values than using a 3D-Array of floats?
I am also concerned about the time it takes to iterate through all the cells (or just the non-zero cells in my grid. This will be very important.
Upvotes: 3
Views: 1638
Reputation: 168646
Quick math: 1601 * 1601 * 253 => 648489853 items
. A test indicates that the dictionary takes about 24 bytes per entry on a 32-bit machine, 49 bytes on a 64-bit machine, so that's
15,563,756,472 bytes (or 30GB on 64-bit). 10% of that is 1.5GB (or 3.0GB on 64-bit). If you have a 64-bit system with a bunch of memory, I think you'll be okay with a sparse representation.
I recommend:
Here is one possibility:
class SparseDict(dict):
def __init__(self, default_value):
dict.__init__(self)
self._value = default_value
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self._value
def __setitem__(self, key, val):
# I'm sure this can go faster if I were smarter
if val == self._value:
if key in self:
del self[key]
else:
dict.__setitem__(self, key, val)
def test(galaxy):
import sys
print len(galaxy), sys.getsizeof(galaxy)
# test is 1/10th size in each dimension,
# so 1/1000th of the volume
for x in range(160):
for y in range(160):
for z in range (25):
import random
# 90% of space is essentially a vacuum
if random.random() < .1:
galaxy[x,y,z] = 1502100
else:
galaxy[x,y,z] = 0
print len(galaxy), sys.getsizeof(galaxy)
test(SparseDict(0))
Upvotes: 2
Reputation: 3935
Maybe try to save your data in sql tables and load only a subset of the cube according to your needs. this will cost you in time to load parts , but will save you memory. as for the in memory represntation, use the methods suggested in the other answers such as dictionary etc...
Upvotes: 0
Reputation: 1108
Try using the dictionary approach, but only store key:value pairs for keys whose value is non-zero. A better key might be a tuple of (x,y,z).
Upvotes: 2