2012ssohn
2012ssohn

Reputation: 159

Handling a List of 1000000+ Elements

I am trying to find a way to find the distribution of prime gaps for primes less than 100,000,000.

My method:

Step 1: Start with a TXT file "primes.txt" which has a list of primes (up to 10,000,000).

Step 2: Have the program read the file and then insert each number into a list, p1.

Step 3: Find the square root of the upper bound (10 times the upper bound of the primes in the TXT file, in this case, 100,000,000) and create another list, p2, with all the primes less than or equal to that square root, rounded up.

Step 4: Define an isPrime() method that checks whether the input is a prime number (N.B.: because I know that the numbers that will be checked are all less than 100,000,000, I only have to check whether the number is divisible by all the primes less than or equal to the square root of 100,000,000, which is 10,000)

Step 5: Add a list l which collects all the prime gaps, then iterate from 1 to 100,000,000, checking the primality of each number. If the number IS prime, then record the gap between it and the last prime number before it, and also write it to another document "primes2.txt".

Step 6: Output the list l.

The problem:

The program seems to take a LONG time to run. I have a feeling that the problem has to do with how I am managing the list due to its size (the Prime Number Theorem estimates about 620,420 elements in that list from "primes.txt"). Is there a way to reduce the runtime for this program by handling the list differently?

I've attached my code below.

import math
import sys

f = open("primes.txt","r")
p1 = []
for i in f:
    p1.append(int(i))
f.close()
ml = 10000000
ms = math.sqrt(10*ml)
p2 = []
x1 = 0
while x1 < len(p1) and p1[x1] <= int(ms+0.5):
    p2.append(p1[x1])
    x1 += 1

def isPrime(n):
    for i in p2:
        if n%i == 0:
            if n/i == 1:
                return True
            return False
    return True

def main():
    l = [0]*1001 #1,2,4,6,8,10,12,...,2000 (1, and then all evens up to 2000)
    lastprime = -1
    diff = 0
    fileobject = open("primes2.txt",'w')
    for i in xrange(1,10*ml):
        if isPrime(i):
            if i > 2:
                diff = i - lastprime
                if diff == 1:
                    l[0] += 1
                else:
                    l[diff/2] += 1
            lastprime = i
            fileobject.write(str(i)+"\n")
        if i%(ml/100) == 0:
            print i/float(ml/10), "% complete"
    fileobject.close()
    print l

main()

EDIT: Made a change in how the program reads from the file

Upvotes: 1

Views: 981

Answers (2)

dawg
dawg

Reputation: 103814

Tips:

  1. For the first few X million, you can generate primes as fast as you can read them from a file, but you need to do it efficiently. See below. (Generating n up to 100 million on my recent MacBook Pro takes about 7 seconds in Python. Generating primes up to 100,000,000 takes 4 minutes. It will be faster in PyPy and way faster in C or Swift or with Python with Numpy)

  2. You are careless with memory. Example: ps = f.read().split('\n') and then use that
    p1 = [int(i) for i in ps] while ps sits in memory unused. Wasteful. Use a loop that reads the file line-by-line so the memory can be more efficiently used. When you read line-by-line, the file does not sit idly in memory after conversion.

  3. There are very good reasons that big primes are useful for cryptography; they take a long time to generate. Python is not the most efficient language to tackle this with.

Here is a sieve of Eratosthenes to try:

def sieve_of_e(n):
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
    return [2] + [i for i in xrange(3,n,2) if sieve[i]]

Upvotes: 1

gku1123
gku1123

Reputation: 11

Use the sieve of eratosthenes to upgrade the prime function. Here's a link:

Sieve of Eratosthenes - Finding Primes Python

Upvotes: 0

Related Questions