Reputation: 7161
I have a very big list of 0's and 1's that are represented as integers - by default - by python, I think: [randint(0, 1) for i in range(50*98)]
And I want to optimize the code so that it utilizes much less memory. The obvious way is to use just 1 bit to represent each of these numbers.
Is it possible to build a list of real binary numbers in python?
Regards, Bruno
EDIT: Thank you all.
From the answers I found out Python doesn't do this by default, so I found this library (that is installed by Macports on OSX so it saves me some trouble) that does bit operations:
python-bitstring
Upvotes: 3
Views: 5396
Reputation: 86652
As delnan already said in a comment, you will not be able to use real binary numbers if you mean bit-for-bit equivalent memory usage.
Integers (or longs) are of course real binary numbers in the meaning that you can address individual bits (using bit-wise operators, but that is easily hidden in a class). Also, long
objects can become arbitrarily large, i.e. you can use them to simulate arbitrarily large bitsets. It is not going to be very fast if you do it in Python, but not very difficult either and a good start.
Using your binary generation scheme from above, you can do the following:
reduce(
lambda (a, p), b: (b << p | a, p + 1),
(random.randint(0, 1) for i in range(50*98)),
(0, 0)
)[0]
Of course, random
supports arbitrarily large upper boundaries, so you can do just that:
r = random.randint(0, 2**(50*98))
This is not entirely the same, since the individual binary digits in the are not independent in the same way they are independent when you create each digit for itself. Then again, knowing you pRNGs work, they are not really independent in the other case, either. If this is a concern for you, you probably should not use the random
module at all, but a hardware RNG.
Upvotes: 2
Reputation: 21925
This uses the bitstring module and constructs a BitArray
object from your list:
from bitstring import BitArray
b = BitArray([randint(0, 1) for i in range(50*98)])
Internally this is now stored packed as bytes so will take considerably less memory. You can slice, index, check and set bits etc. with the usual notation and there additional methods such as set
, all
and any
to modify the bits.
To get the data back as a binary string just use b.bin
and to get out the byte packed data use b.tobytes()
which will pad with zero bits up to a byte boundary.
Upvotes: 4
Reputation: 96994
Perhaps you could look into a lossless compression scheme for your data? Presumably there will be a lot of redundancy in such a list.
Upvotes: 0
Reputation: 35089
It seems you need some kind of bit-set. I'm not sure if this example entirely suits your needs, but it is worth a try.
Upvotes: 0
Reputation: 130004
It's called a bit vector, or a bitmap. Try e.g. BitVector. If you want to implement it yourself, you need to use a numeric object, not a list, and use bitwise operations to toggle bits, e.g.
bitmap = 0
bit = (1 << 24)
bitmap |= bit # enable bit
bitmap &= ~bit # disable bit
Upvotes: 1