Reputation: 10636
I'm trying to implement the sieve of eratosthenes in python, however when trying to find all primes up to the sqare root of for instance 779695003923747564589111193840021 I get an error saying result of range() has too many items. My question is, how do I avoid this problem, if I instantiate the list with a while loop I will get an error saying I'm using too much memory (before it even starts to use the pagefile), the two are listed below:
Using range()
maxnum = 39312312323123123
primes = []
seq = []
i = 0
seq = range(2,maxnum)
for i in seq:
mul = i * seq
for j in mul:
try:
seq.remove(j)
except:
pass
primes.append(i)
print primes
Using while:
maxnum = 39312312323123123
primes = []
seq = []
i = 0
while i < maxnum:
seq.append(i)
i+=1
for i in seq:
mul = i * seq
for j in mul:
try:
seq.remove(j)
except:
pass
primes.append(i)
print primes
Upvotes: 4
Views: 2946
Reputation: 304255
Your algorithm is broken. Get it to work for maxnum=100 first.
Once you get it working you will find maxnum=100000000 will take a long long time to run.
Plot the time it takes to run for maxnum in (10,100,1000,10000,100000,1000000...) and you may be able to extrapolate how long 39312312323123123 will take :)
Upvotes: 2
Reputation: 8966
Try this:
def getPrimes(maxnum):
primes = []
for i in xrange(2, maxnum):
is_mul = False
for j in primes: # Try dividing by all previous primes
if i % j == 0:
is_mul = True # Once we find a prime that i is divisible by
break # short circuit so we don't have to try all of them
if not is_mul: # if we try every prime we've seen so far and `i`
primes.append(i) # isn't a multiple, so it must be prime
return primes
You shouldn't run out of memory until you get to a very large number of primes. This way you don't have to worry about creating a list of multiples. Not sure if this still counts as the sieve though.
Actually, this won't work for maxnum = 39312312323123123
. Using the Prime number theorem we can estimate that there will be approximately 1,028,840,332,567,181
prime numbers in that range.
As pointed out in this question the maximum size of a python list on a 32-bit system is 536,870,912
. So even if you don't run out of memory, you still won't be able to finish the calculation.
You shouldn't have that problem with a 64-bit system though.
2 ** 64 => 18446744073709551616
Using the math from the aforementioned question (2 ** 64) / 8
, the maximum number of elements in a list would be 2,305,843,009,213,693,951
which is greater than the estimated number of primes you will encounter.
Edit:
To avoid memory issues, you could store your list of primes in a file on the hard disk. Store one prime per line and read the file every time you check a new number.
Maybe something like this:
primes_path = r'C:\temp\primes.txt'
def genPrimes():
for line in open(primes_path, 'r'):
yield int(line.strip())
def addPrime(prime):
primes_file = open(primes_path, 'a')
primes_file.write('%s\n' % prime)
primes_file.close()
def findPrimes(maxnum):
for i in xrange(2, maxnum):
is_mul = False
for prime in genPrimes(): # generate the primes from a file on disk
if i % prime == 0:
is_mul = True
break
if not is_mul:
addPrime(i) # append the new prime to the end of your primes file
At the end, you would have a file on your hard drive that contained all your primes.
Ok, so this would be pretty slow, but you wouldn't run out of memory. You could make it faster by increasing the speed at which you can read/write files (like RAID).
Upvotes: 0
Reputation: 26709
I would say, "use xrange()
instead", but you are actually using the list of ints as the sieve result..... So an integer generator is not a correct solution.
I think it will be difficult to materialize a list with 39312312323123123 elements in it, no matter what function you use to do so.... That is, after all, 279 petabytes of 64-bit integers.
Try this.
class FoundComposite(Exception): pass
primes = [2]
seq = itertools.takewhile( # Take integers from a list
lambda x: x<MAXNUM, # until we reach MAXNUM
itertools.count(2) # the list of integers starting from 2
)
#seq = xrange(2, MAXNUM) # alternatively
for i in seq:
try:
for divisor in primes:
if not (i % divisor):
# no remainder - thus an even divisor
# continue to next i in seq
raise FoundComposite
# if this is reached, we have tried all divisors.
primes.append(i)
except FoundComposite:
pass
Upvotes: 6
Reputation:
It's a more complex algorithm, perhaps technically not counting as the sieve, but one approach is to not remove all multiples of a given prime at once, but queue the next multiple (along with the prime). This could be used in a generator implementation. The queue will still end up containing a lot of (multiples of) primes, but not as many as by building then filtering a list.
First few steps done manually, to show the principle...
Note - the queue isn't a FIFO. You will always be extracting the tuples with the lowest first item, but new/replacement tuples don't (usually) have the highest first item and (as with 6 above) there will be duplicates.
To handle the queue efficiently in Python, I suggest a dictionary (ie hashtable) keyed by the first item of the tuple. The data is a set of second item values (original primes).
As suggested elsewhere, test with small targets before trying for the big one. And don't be too surprised if you fail. It may still be that you need too many heap-allocated large integers at one time (in the queue) to complete the solution.
Upvotes: 2
Reputation: 28869
About the memory limit, how about creating a custom list (class) that internally is a linked list of lists or arrays. Magically traverse from one to the other internally, and add more as needed, as the caller uses your custom list with the external interface you've provided which will be similar to those members needed to facilitate the .append .remove, etc needs of the arrays used in your problem.
Note: I'm not a Python programmer. Not a clue how to implement what I said in Python. Maybe I don't know the context here, so I will understand if I'm voted down.
Maybe use "generators" as they're called in python to yield results of your internal lists as if it were one huge single list. Possibly with linked list.
Upvotes: 0
Reputation: 304255
There is a third party module for python called gmpy
It has a couple of functions that may be useful to you as they are very fast. The probabilistic stuff kicks in around the 4 billion mark.
next_prime(...)
next_prime(x): returns the smallest prime number > x. Note that
GMP may use a probabilistic definition of 'prime', and also that
if x<0 GMP considers x 'prime' iff -x is prime; gmpy reflects these
GMP design choices. x must be an mpz, or else gets coerced to one.
is_prime(...)
is_prime(x,n=25): returns 2 if x is _certainly_ prime, 1 if x is
_probably_ prime (probability > 1 - 1/2**n), 0 if x is composite.
If x<0, GMP considers x 'prime' iff -x is prime; gmpy reflects this
GMP design choice. x must be an mpz, or else gets coerced to one.
Upvotes: 1
Reputation: 46627
range()
returns a list containing all the numbers in the requested range, while xrange
is a generator and yields the numbers one after another with a memory consumption close to zero.
Upvotes: 0