thkang
thkang

Reputation: 11533

python multidimensional boolean array?

it would contain at most 1000 x 1000 x 1000 elements, which is too big for python dictionary.

with dict, around 30 x 1000 x 1000 elements, on my machine it already consumed 2gb of memory and everything got stoned.

any modules that can handle 3-dimension array whose value would be only True/False? I check bitarray http://pypi.python.org/pypi/bitarray, which seems reasonable and coded in C, however it seems more like a bit-stream instead of an array, since it supports only 1 dimension.

Upvotes: 2

Views: 7462

Answers (4)

EnricoGiampieri
EnricoGiampieri

Reputation: 6085

numpy is your friend:

import numpy as np
a = np.zeros((1000,1000,1000), dtype=bool)
a[1,10,100] = True

Has a memory footprint as little as possible.

EDIT:

If you really need you can also look at the defaultdict class container in the collections module, which doesn't store the values that are of the default value. But if it's not really a must, use numpy.

Upvotes: 6

abarnert
abarnert

Reputation: 365597

numpy has already been suggested by EnricoGiampieri, and if you can use this, you should.

Otherwise, there are two choices:

A jagged array, as suggested by NPE, would be a list of list of bitarrays. This allows you to have jagged bounds—e.g., each row could be a different width, or even independently resizable:

bits3d = [[bitarray.bitarray(1000) for y in range(1000)] for x in range(1000)]
myvalue = bits3d[x][y][z]

Alternatively, as suggested by Xymostech, do your own indexing on a 1-D array:

bits3d = bitarray.bitarray(1000*1000*1000)
myvalue = bits3d[x + y*1000 + z*1000*1000]

Either way, you'd probably want to wrap this up in a class, so you can do this:

bits3d = BitArray(1000, 1000, 1000)
myvalue = bits3d[x, y, z]

That's as easy as:

class Jagged3DBitArray(object):
    def __init__(self, xsize, ysize, zsize):
        self.lll = [[bitarray(zsize) for y in range(ysize)] 
                    for x in range(xsize)]
    def __getitem__(self, key):
        x, y, z = key
        return self.lll[x][y][z]
    def __setitem__(self, key, value):
        x, y, z = key
        self.lll[x][y][z] = value

class Fixed3DBitArray(object):
    def __init__(self, xsize, ysize, zsize):
        self.xsize, self.ysize, self.zsize = xsize, ysize, zsize
        self.b = bitarray(xsize * ysize * zsize)
    def __getitem__(self, key):
        x, y, z = key
        return self.b[x + y * self.ysize + z * self.ysize * self.zsize]
    def __setitem__(self, key, value):
        x, y, z = key
        self.b[x + y * self.ysize + z * self.ysize * self.zsize] = value

Of course if you want more functionality (like slicing), you have to write a bit more.

The jagged array will use a bit more memory (after all, you have the overhead of 1M bitarray objects and 1K list objects), and may be a bit slower, but this usually won't make much difference.

The important deciding factor should be whether it's inherently an error for your data to have jagged rows. If so, use the second solution; if it might be useful to have jagged or resizable rows, use the former. (Keeping in mind that I'd use numpy over either solution, if at all possible.)

Upvotes: 2

inspectorG4dget
inspectorG4dget

Reputation: 113905

How about getting inspired by unix file permissions? 755 is read,write,execute for owner and read,execute for everyone else. This is because 7 translates to binary 111.

So your 1000x1000x1000 bool array could be a 1000x1000 list of ints in which the binary representation of each int gives you a 1000 "bit" string representing the bool array.

All of that should fit in under 1GB of memory

Upvotes: 0

NPE
NPE

Reputation: 500167

How about a list of lists of bitarrays, perhaps wrapped into your own class with a nice API?

Alternatively, an 3D NumPy array of integers, with your own code packing/unpacking multiple booleans into each integer.

Upvotes: 2

Related Questions