PanC
PanC

Reputation: 11

Sudden reduction in python iteration speed after 10 seconds

I am writing some code in python to find out the first natural number which does not appear in the first billion digits of pi. Here's what I've written:

import datetime
from tqdm import tqdm
def findpi(end):
    notin = []
    stop = 0
    s1 = datetime.datetime.now()
    with open(r"C:\Users\shamm\Desktop\Text Documents\1 BILLION Digits of pi.txt", 'rt') as p:
        pi = str(p.read())
        for i in tqdm(range(1,end+1)):
            if str(i) not in pi:
                if notin == []:
                    stop = i
                notin.append(i)
    s2 = datetime.datetime.now()
    tdelta = s2 - s1
    ts = tdelta.total_seconds()
    return [notin, stop, ts]

pi = findpi(1000000)
print("Not in:", pi[0])
print("Last:", pi[1])
print("Time taken:", pi[2])

It works fine for small numbers, and when I tried it with the first million natural numbers, the code faces a sudden blockade. For the first 10 seconds, it runs at about 10k iterations/s, but then it suddenly goes down to 1k iterations/s. I tried it with a larger input of 10 million, and the same thing happens, running at 10k it/s only for 10 seconds. After 30+ minutes of running, it goes down to 100 it/s.

Is there a bottleneck which limits it from using too much memory or processing power? Or is something wrong in my code?

Edit: so it seems like the cause is the length of the ever-increasing length of the number which needs to be searched. How can I optimize this search so that it will not slow down with every increase in the number of digits?

Upvotes: 0

Views: 69

Answers (1)

no comment
no comment

Reputation: 10466

That's expected. You slow down by factor 10 every time you reach a larger number of digits. Numbers with k+1 digits take about ten times longer to find than numbers with k digits. Because their first k digits are equally hard to find but then there's only a 10% chance that the extra digit also matches.

You could instead go through pi's digits, building the numbers it does contain, and record them. Then a second pass can quickly look up whether each number was seen:

    with open(r"...", 'rb') as p:
        pi = bytearray(p.read())
        pi = pi.translate(pi.maketrans(b'0123456789', bytes(range(10))))
        seen = bytearray(end+1)
        while pi:
            i = 0
            for d in pi:
                i = i * 10 + d
                if i > end:
                    break
                seen[i] = 1
            del pi[0]
        for i in tqdm(range(1,end+1)):
            if not seen[i]:
                if notin == []:
                    stop = i
                notin.append(i)

With 10 million digits, findpi(1000000) takes me about 6 seconds, so I estimate with a billion digits it will take about 10 minutes.

Upvotes: 0

Related Questions