Dan
Dan

Reputation: 10171

Extremely large Boolean list in Python

I want to create an object in python that is a collection of around 200,000,000 true/false values. So that I can most effectively change or recall any given true/false value, so that I can quickly determine if any given number, like 123,456,000 is true or false or change its value.

Is the best way to do this a list? or an array? or a class? or just a long int using bit operations? or something else?

I'm a bit noob so you may have to spell things out for me more than if I were asking the question in one of the other languages I know better. Please give me examples of how operating on this object would look.

Thanks

Upvotes: 8

Views: 6777

Answers (6)

Matthias Urlichs
Matthias Urlichs

Reputation: 2534

If the bits which are set are mostly consecutive, there's also the option to store a list of ranges, e.g. the PyPI module https://pypi.org/project/range_set/ which is API compatible to Python's set class.

Upvotes: 0

Scott Griffiths
Scott Griffiths

Reputation: 21925

You might also like to try the bitstring module, which is pure Python. Internally it's all stored as a byte array and the bit masking and shifting is done for you:

from bitstring import BitArray
# Initialise with two hundred million zero bits
s = BitArray(200000000)
# Set a few bits to 1
s.set(1, [76, 33, 123456000])
# And test them
if s.all([33, 76, 123456000]):
    pass

The other posters are correct though that a simple set might be a better solution to your particular problem.

Upvotes: 3

Michael Dillon
Michael Dillon

Reputation: 32392

At first glance, the Python BitVector module sounds like it does exactly what you want. It's available at http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.5.1.html and since it is pure Python code, it will run on any platform with no compiling needed.

You mentioned that you need some speed in getting and setting any arbitrary true-false value. For that you need to use a Python array, rather than a list, and if you go to the above URL and browse the source code for BitVector you can see that it does indeed rely on Python arrays.

Ideally, you would encapsulate what you are doing in a class that subclasses from BitVector, i.e.

class TFValues(BitVector):
   pass

That way you can do things like add a list to contain associated info such as the name of a particular TF value.

Upvotes: 1

S.Lott
S.Lott

Reputation: 391872

"quickly determine if any given number, like 123,456,000 is" in the "true" set or "false" set.

This is what a set is for.

The "true" set is a set of all the numbers.

To make a number's boolean flag "true", add it to the true set.

To make a number's boolean flag "false", remove it from the true set.

Life will be much simpler.

Upvotes: 4

Lukáš Lalinský
Lukáš Lalinský

Reputation: 41306

You can try the bitarray module, or write a similar thing using an array of integers yourself.

Upvotes: 13

jldupont
jldupont

Reputation: 96746

Have you considered using a lightweight database like SQLite?

Upvotes: 3

Related Questions