Reputation: 11
In a given challenge, the function below was requested to be executed in less than 1 second:
from time import perf_counter
def loop(seed, n):
tic_0 = perf_counter()
r, m = seed, 0
for i in range(1, n+1):
m += 362437
r = (r ** 2 + m) % 4294967296
r = (r // 256) % 65536
print('\nTotal Runtime = {:.6f}'.format(perf_counter() - tic_0))
return r
loop(366, 11223300) -> 1 second limit
The only tip provided is: a compiled program can run up to 200 times faster than a mathematically identical script in the Python interpreter.
As the next term of the loop is dependent on the previous term, the use of the "multiprocessing" module was not useful to me. Could you give me any idea how this tip can be applied?
Code execution currently takes about 10 seconds.
Upvotes: 1
Views: 201
Reputation: 881463
Well, that only took 4.04s on my box but I got an instant reduction to 2.5s by simply changing r ** 2
into r * r
. I could further get it down to 1.6s by using bit shifting and bitwise operations, such as changing:
r = (r * r + m) % 4294967296
r = (r // 256) % 65536
into:
r = (r * r + m) & 0xffffffff
r = (r >> 8) & 0xffff
Then, when you realise which bits are going to survive those two operations, you can turn that into the single expression:
r = ((r * r + m) & 0xffff00) >> 8
That drops the time down to 1.33s, easily doubling the speed (a reduction of 67% in elapsed time).
If you switch to something like Numba, you can get even more improvement:
from numba import jit
from time import perf_counter
@jit()
def loop(seed, n):
r, m = seed, 0
for i in range(n):
m += 362437
r = ((r * r + m) & 0xffff00) >> 8
return r
tic_0 = perf_counter()
x = loop(366, 11223300)
print('\nTotal Runtime = {:.6f}'.format(perf_counter() - tic_0))
print(x)
That brings the execution time down to a rather minuscule 110ms, well within your one-second limit, and roughly a 90% improvement in elapsed time over the original.
And that actually includes the time taken to initially JIT-compile the code, so the improvement is even better on subsequent calls. By changing the code so that we JIT the function first, the time drops to 20ms, a saving of 99.5%:
x = loop(1, 2)
tic_0 = perf_counter()
x = loop(366, 11223300)
print('\nTotal Runtime = {:.6f}'.format(perf_counter() - tic_0))
print(x)
That's on par with similar C code, which you could probably treat as the benchmark for performance. For example, the following C code executes in 27ms, a reduction of about 99.3% in time taken (that's with default optimisation; using -O3
gives about 18ms):
#include <stdio.h>
int main(void) {
unsigned long r = 366, m = 0;
for (int i = 0; i < 11223300; ++i) {
m += 362437;
r = ((r * r + m) & 0xffff00UL) >> 8;
}
printf("%ld\n", r);
return 0;
}
Upvotes: 1
Reputation: 396
Is cheating allowed?
The initial seed value is not stored, so if the value of r
ever hits zero that means the initial seed value is meaningless.
The & 0xffff
operation takes place every loop, and the chance that a randomly picked number happens to have its trailing digits as 0x...0000
(resulting in 0x...0000 & 0xffff = 0
) is (1/16)**4 = 65536
.
That happens 187 times for those given initial values, which happens to be every 60017 iterations (pretty close).
That means you only need to calculate the last million or so values to have an "optimized" loop that is observationally identical:
from time import perf_counter
def loop2(seed, n):
r, m = seed, 0
for i in range(max(1, n - 1000000), n+1):
r = ((r * r + i * 362437) >> 8) & 0xffff
return r
tic_0 = perf_counter()
for init in range(1000):
result = loop2(init, 11223300)
assert result == 1738
print('\nTotal Runtime = {:.6f}'.format(perf_counter() - tic_0))
These 1000 loops all gave the same answer and took 315 seconds to run on my machine, for an average time of 0.315 seconds.
Upvotes: 0