Krimson
Krimson

Reputation: 7664

Handle an extremely large list for the Sieve of Eratosthenes

In my python code, I have the following line

vals = [True] * n

This code works when n is a small number but I want this to work when n ~ 100,000,000 without getting a MemoryError

I even tried using a dictionary like this but it still wont work

vals = {}
for i in range(0, n):
    vals[i] = True

Any suggestions?

Edit For those of you wondering what I am trying to do. I am trying to implement the Sieve of Eratosthenes algorithm to generate prime numbers. The vals variable keeps track of the composite numbers which are found later on in the algorithm

Edit 2 I don't think so I can use generators because I need to able to change values later on like this:

vals = [True] * n
for i in range(0, n):
    if condition:
        vals[i] = False

Upvotes: 0

Views: 490

Answers (5)

ivan_pozdeev
ivan_pozdeev

Reputation: 36036

Do not keep all the data in memory, only keep what is needed at the moment or keep some other representation so generate it on the fly.

In this case, you should probably use a "thin air" representation instead:

  • keep the list of all the primes found so far and
  • for each prime, keep the composite number that this prime's filter was last triggered on
  • Then, walk the number axes a single time and check for each prime-last_multiple pair if the current number is the next multiple

Alternatively, you can offload your data to disk and e.g. use offsets to access elements (that would be rather slow) or write a class with a list-like interface that would use disk and a cache, loading and offloading data when needed (mmap is a good backing choice IMO).

Upvotes: 1

user1277476
user1277476

Reputation: 2909

Check these out:

http://stromberg.dnsalias.org/~strombrg/bits_mod/

http://stromberg.dnsalias.org/~strombrg/sieve/

The first is a bit array type that uses at least 31 bits of multiple ints.

The second is a sieve built on that bit array type.

If you still need something that won't fit into memory, you might look into modifying bits_mod to use mmap, as done in, for example: http://stromberg.dnsalias.org/~strombrg/drs-bloom-filter/

Here's some timing information, using pypy3 2.4.0, in case you decide to go the numpy route, but remain curious how pypy3 could do:

$ time ./sieve 100000000 > /dev/null
Creating bit array
Clearing multiples of 2
Clearing multiples of 3
Clearing multiples of 5
Clearing multiples of 7
...
Clearing multiples of 9931
Clearing multiples of 9941
Clearing multiples of 9949
Clearing multiples of 9967
Clearing multiples of 9973

real    1m10.749s
user    1m8.133s
sys     0m2.405s

Upvotes: -2

pad
pad

Reputation: 2396

You can use generators

vals = lambda n: (True for _ in range(n))

EDIT: since OP asked for a numpy implementation

n = 100000000
prime = lambda n: list(filter(lambda x: (x % np.arange(2, 1+int(math.sqrt(x)))).all(), range(2, n+1)))

Upvotes: 0

chapelo
chapelo

Reputation: 2562

Using numpy you can handle your large list:

import numpy as np

vals = np.ones(100000000,bool)
print(vals)
# [ True  True  True ...,  True  True  True]

Upvotes: 3

user3426575
user3426575

Reputation: 1773

You might want to use a generator:

def vals():
    for i in range(0, n):
        yield True

For compatibility with Python 2.x, it might be preferable to use xrange:

try:
    xrange
except NameError:
    xrange = range

def vals():
    for i in xrange(0, n):
        yield True

Otherwise, your program will silently run out of resources if run with an older interpreter.

Upvotes: 0

Related Questions