jbssm
jbssm

Reputation: 7161

Building a list with binary numbers in binary representation in Python

I have a very big list of 0's and 1's that are represented as integers - by default - by python, I think: [randint(0, 1) for i in range(50*98)]

And I want to optimize the code so that it utilizes much less memory. The obvious way is to use just 1 bit to represent each of these numbers.

Is it possible to build a list of real binary numbers in python?

Regards, Bruno

EDIT: Thank you all.
From the answers I found out Python doesn't do this by default, so I found this library (that is installed by Macports on OSX so it saves me some trouble) that does bit operations: python-bitstring

Upvotes: 3

Views: 5396

Answers (5)

Torsten Marek
Torsten Marek

Reputation: 86652

As delnan already said in a comment, you will not be able to use real binary numbers if you mean bit-for-bit equivalent memory usage.

Integers (or longs) are of course real binary numbers in the meaning that you can address individual bits (using bit-wise operators, but that is easily hidden in a class). Also, long objects can become arbitrarily large, i.e. you can use them to simulate arbitrarily large bitsets. It is not going to be very fast if you do it in Python, but not very difficult either and a good start.

Using your binary generation scheme from above, you can do the following:

reduce(
    lambda (a, p), b: (b << p | a, p + 1), 
    (random.randint(0, 1) for i in range(50*98)),
    (0, 0)
)[0]

Of course, random supports arbitrarily large upper boundaries, so you can do just that:

r = random.randint(0, 2**(50*98))

This is not entirely the same, since the individual binary digits in the are not independent in the same way they are independent when you create each digit for itself. Then again, knowing you pRNGs work, they are not really independent in the other case, either. If this is a concern for you, you probably should not use the random module at all, but a hardware RNG.

Upvotes: 2

Scott Griffiths
Scott Griffiths

Reputation: 21925

This uses the bitstring module and constructs a BitArray object from your list:

from bitstring import BitArray
b = BitArray([randint(0, 1) for i in range(50*98)])

Internally this is now stored packed as bytes so will take considerably less memory. You can slice, index, check and set bits etc. with the usual notation and there additional methods such as set, all and any to modify the bits.

To get the data back as a binary string just use b.bin and to get out the byte packed data use b.tobytes() which will pad with zero bits up to a byte boundary.

Upvotes: 4

Alex Reynolds
Alex Reynolds

Reputation: 96994

Perhaps you could look into a lossless compression scheme for your data? Presumably there will be a lot of redundancy in such a list.

Upvotes: 0

Constantinius
Constantinius

Reputation: 35089

It seems you need some kind of bit-set. I'm not sure if this example entirely suits your needs, but it is worth a try.

Upvotes: 0

Cat Plus Plus
Cat Plus Plus

Reputation: 130004

It's called a bit vector, or a bitmap. Try e.g. BitVector. If you want to implement it yourself, you need to use a numeric object, not a list, and use bitwise operations to toggle bits, e.g.

 bitmap = 0
 bit = (1 << 24)
 bitmap |= bit  # enable bit
 bitmap &= ~bit # disable bit

Upvotes: 1

Related Questions