Mr. T
Mr. T

Reputation: 12410

Memory overflow in numpy boolean array

I am trying to create a large Boolean array (for a prime number sieve). I used first Python lists, but at limit = 10^9 this created a MemoryError.

boolarray = [True] * limit

Then I learned about Numpy and read that it is better with space organisation, so I tried

boolarray = np.full(limit, True, dtype = bool)

The limit only marginally increased to 10^10, which is not sufficient, since I need 10^12. I find this surprising, you just need a bit for Boolean, don't you? Any idea, how to overcome this problem? Thanks in advance.

Upvotes: 2

Views: 815

Answers (1)

MB-F
MB-F

Reputation: 23637

Let's put aside the fact that 10^12 bits will probably not easily fit into memory. If you care more about memory usage than performance you can pack the bits into a byte array. This comes at the cost of additional computations when reading/writing bits (which is the reason numpy stores booleans as bytes).

import numpy as np


def getbit(bitarray, index):
    i, j = index // 8, index % 8
    x = bitarray[i]
    return x & (1 << j) != 0


def setbit(bitarray, index, value):
    value = bool(value)
    i, j = index // 8, index % 8
    x = bitarray[i]
    bitarray[i] ^= (np.uint(-value) ^ x) & (1 << j)


n = 10**5 // 8
bitarray = np.zeros(n, dtype=np.uint8)  # initialize all bits to 0

print(getbit(bitarray, 19))  # False

setbit(bitarray, 19, True)
print(getbit(bitarray, 19))  # True

setbit(bitarray, 19, False)
print(getbit(bitarray, 19))  # False

Upvotes: 2

Related Questions