user1111111111111
user1111111111111

Reputation: 68

Python 2 lists of positive integers finding prime number

Given 2 lists of positive integers, find how many ways you can select a number from each of the lists such that their sum is a prime number.

My code is tooo slow As i have both list1 and list 2 containing 50000 numbers each. So any way to make it faster so it solves it in minutes instead of days?? :)

    # 2 is the only even prime number
    if n == 2: return True

    # all other even numbers are not primes
    if not n & 1: return False

    # range starts with 3 and only needs to go 
    # up the squareroot of n for all odd numbers
    for x in range(3, int(n**0.5)+1, 2):
        if n % x == 0: return False

    return True



for i2 in l2:
    for i1 in l1:
        if isprime(i1 + i2):
            n = n + 1 # increasing number of ways
            s = "{0:02d}: {1:d}".format(n, i1 + i2)
            print(s) # printing out

Upvotes: 2

Views: 1285

Answers (3)

Tim Peters
Tim Peters

Reputation: 70602

Sketch:

  1. Following @Steve's advice, first figure out all the primes <= max(l1) + max(l2). Let's call that list primes. Note: primes doesn't really need to be a list; you could instead generate primes up the max one at a time.

  2. Swap your lists (if necessary) so that l2 is the longest list. Then turn that into a set: l2 = set(l2).

  3. Sort l1 (l1.sort()).

Then:

for p in primes:
    for i in l1:
        diff = p - i
        if diff < 0:
            # assuming there are no negative numbers in l2;
            # since l1 is sorted, all diffs at and beyond this
            # point will be negative
            break
        if diff in l2:
           # print whatever you like
           # at this point, p is a prime, and is the
           # sum of diff (from l2) and i (from l1)

Alas, if l2 is, for example:

l2 = [2, 3, 100000000000000000000000000000000000000000000000000]

this is impractical. It relies on that, as in your example, max(max(l1), max(l2)) is "reasonably small".

Fleshed out

Hmm! You said in a comment that the numbers in the lists are up to 5 digits long. So they're less than 100,000. And you said at the start that the list have 50,000 elements each. So they each contain about half of all possible integers under 100,000, and you're going to have a very large number of sums that are primes. That's all important if you want to micro-optimize ;-)

Anyway, since the maximum possible sum is less than 200,000, any way of sieving will be fast enough - it will be a trivial part of the runtime. Here's the rest of the code:

def primesum(xs, ys):
    if len(xs) > len(ys):
        xs, ys = ys, xs
    # Now xs is the shorter list.
    xs = sorted(xs)  # don't mutate the input list
    sum_limit = xs[-1] + max(ys)  # largest possible sum
    ys = set(ys)     # make lookups fast
    count = 0
    for p in gen_primes_through(sum_limit):
        for x in xs:
            diff = p - x
            if diff < 0:
                # Since xs is sorted, all diffs at and
                # beyond this point are negative too.
                # Since ys contains no negative integers,
                # no point continuing with this p.
                break
            if diff in ys:
                #print("%s + %s = prime %s" % (x, diff, p))
                count += 1
    return count

I'm not going to supply my gen_primes_through(), because it's irrelevant. Pick one from the other answers, or write your own.

Here's a convenient way to supply test cases:

from random import sample
xs = sample(range(100000), 50000)
ys = sample(range(100000), 50000)
print(primesum(xs, ys))

Note: I'm using Python 3. If you're using Python 2, use xrange() instead of range().

Across two runs, they each took about 3.5 minutes. That's what you asked for at the start ("minutes instead of days"). Python 2 would probably be faster. The counts returned were:

219,334,097

and

219,457,533

The total number of possible sums is, of course, 50000**2 == 2,500,000,000.

About timing

All the methods discussed here, including your original one, take time proportional to the product of two lists' lengths. All the fiddling is to reduce the constant factor. Here's a huge improvement over your original:

def primesum2(xs, ys):
    sum_limit = max(xs) + max(ys)  # largest possible sum
    count = 0
    primes = set(gen_primes_through(sum_limit))
    for i in xs:
        for j in ys:
            if i+j in primes:
                # print("%s + %s = prime %s" % (i, j, i+j))
                count += 1
    return count

Perhaps you'll understand that one better. Why is it a huge improvement? Because it replaces your expensive isprime(n) function with a blazing fast set lookup. It still takes time proportional to len(xs) * len(ys), but the "constant of proportionality" is slashed by replacing a very expensive inner-loop operation with a very cheap operation.

And, in fact, primesum2() is faster than my primesum() in many cases too. What makes primesum() faster in your specific case is that there are only around 18,000 primes less than 200,000. So iterating over the primes (as primesum() does) goes a lot faster than iterating over a list with 50,000 elements.

A "fast" general-purpose function for this problem would need to pick different methods depending on the inputs.

Upvotes: 2

hughdbrown
hughdbrown

Reputation: 49013

I would find the highest number in each range. The range of primes is the sum of the highest numbers.

Here is code to sieve out primes:

def eras(n):
    last = n + 1
    sieve = [0, 0] + list(range(2, last))
    sqn = int(round(n ** 0.5))
    it = (i for i in xrange(2, sqn + 1) if sieve[i])
    for i in it:
        sieve[i * i:last:i] = [0] * (n // i - i + 1)
    return filter(None, sieve)

It takes around 3 seconds to find the primes up to 10 000 000. Then I would use the same n ^ 2 algorithm you are using for generating sums. I think there is an n logn algorithm but I can't come up with it.

It would look something like this:

from collections import defaultdict
possible = defaultdict(int)
for x in range1:
    for y in range2:
        possible[x + y] += 1

def eras(n):
    last = n + 1
    sieve = [0, 0] + list(range(2, last))
    sqn = int(round(n ** 0.5))
    it = (i for i in xrange(2, sqn + 1) if sieve[i])
    for i in it:
        sieve[i * i:last:i] = [0] * (n // i - i + 1)
    return filter(None, sieve)

n = max(possible.keys())
primes = eras(n)
possible_primes = set(possible.keys()).intersection(set(primes))

for p in possible_primes:
    print "{0}: {1} possible ways".format(p, possible[p])

Upvotes: 1

Steve
Steve

Reputation: 7271

You should use the Sieve of Eratosthenes to calculate prime numbers.

You are also calculating the prime numbers for each possible combination of sums. Instead, consider finding the maximum value you can achieve with the sum from the lists. Generate a list of all the prime numbers up to that maximum value.

Whilst you are adding up the numbers, you can see if the number appears in your prime number list or not.

Upvotes: 1

Related Questions