ILoveG11
ILoveG11

Reputation: 69

Performance - why is a prime-generating algorithm with range much faster than using prime list?

I wrote two codes with almost same structure,

def prime_gen1(Limit = 10000):
    List = [2,]
    for x in range(3,Limit):
        for y in List:
            if not x%y:
                break
        if not x%y:
            continue
        else:
            List.append(x)
            yield x

def prime_gen2(Limit = 10000):
    from math import floor
    for x in range(3,Limit):
        for y in range(2, floor(x**0.5)+2):
            if not x%y:
                break
        if not x%y:
            continue
        else:
            yield x

>>> list(prime_gen1(20000)) == list(prime_gen2(20000))
True
>>> def time1(number):
    st = time()
    list(prime_gen1(number))
    end = time()
    return end - st

>>> def time2(number):
    st = time()
    list(prime_gen2(number))
    end = time()
    return end - st

One does same work as other, but the latter actually works much faster. I'm wondering why this happens.

Logically - or nonlogically, I thought checking with primes will outdo the other way, in this case- checking by numers between 3 and root of number.But time-checking showed vice versa, checking with all the numers works much faster - about 5 times. Its performance increasingly differs,

>>> time1(200000)
8.185129404067993
>>> time2(200000)
0.4998643398284912

Second method is outdoing it. What makes this different?

Upvotes: 1

Views: 84

Answers (2)

FHTMitchell
FHTMitchell

Reputation: 12157

Some better timings

%timeit list(prime_gen1(10**5))
2.77 s ± 204 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(prime_gen2(10**5))
219 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

You take a number of optimisation steps in your second algorithm but you don't in your first: here are some problems with your first algorithm.

def prime_gen1(Limit = 10000):
    List = [2,]
    for x in range(3,Limit):  # we're checking all even numbers too
        for y in List:  # only need to check up to sqrt(x)!!
            if not x%y:
                break
        if not x%y:  # why are we checking again? Use a for-else construct
            continue
        else:
            List.append(x)  # just return the list at the end
            yield x  # when wrapped in list just copies List

Here is an optimised version of the first algorithm (not a generator because a generator from a list is just pointless):

def memeff_primelist(n):
    if n <= 2:
        return []
    primes = []  # Add 2 in at the end, don't need to check non-even
    for i in range(3, int(n), 2):
        for p in primes:
            if i % p == 0:  # non-prime
                break
            if p * p > i:  # no factors bigger than sqrt(i)!!!
                primes.append(i)
                break
        else:
            primes.append(i)  # only for i == 3
    primes.insert(0, 2)
    return primes

%timeit memeff_primelist(10**5)
88.9 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

Derte Trdelnik
Derte Trdelnik

Reputation: 2706

the list version does a lot more checks than the one going only to a square root of the number

for limit 200000 the square root is ~447 there are 17983 prime numbers smaller than 200000

just add a count of how many times you do the x%y check like

def prime_gen1(Limit = 10000):
    List = [2,]
    modulo_checks = 0
    for x in range(3,Limit):
        for y in List:
            modulo_checks += 1
            if not x%y:
                break
        if not x%y:
            continue
        else:
            List.append(x)
            yield x
    print(modulo_checks)

def prime_gen2(Limit = 10000):
    from math import floor
    modulo_checks = 0
    for x in range(3,Limit):
        for y in range(2, floor(x**0.5)+2):
            modulo_checks += 1
            if not x%y:
                break
        if not x%y:
            continue
        else:
            yield x
    print(modulo_checks)

now for limit 200000 the version 1 does 162416226 checks and the second one 7185445

if you add an early break for looping of the list the list version is significantly faster(2times as fast 1799767 checks 0.24s vs 7185445 checks 0.64s)

...
    sq_root = floor(x ** 0.5) + 2
    for y in List:
        modulo_checks += 1
        if not x % y or y > sq_root:
            break
...

and remove the math import if you want to compare algorithm speeds

Upvotes: 4

Related Questions