Reputation: 43
For example: I have a function consisting of a while loop (this one would check for primes)
function isprime(int number){
int i=2;
int max= (int) sqrt(number)+1;
while(i<max){
if(number%i==0){
return 0;
}
i++;
}
return 1;
}
I know this would be a very poor algorithm for testing primes, but my question focuses more on loops anyway. Currently a forth of the function's operations are "just" checking the condition. For larger numbers, this can be an awful lot.
(Quick Example 1: If "number" is 1009, that would be checking the while condition 30 times, 30 operations for the index, and 29*2 operations for the if condition. That's 118 operations)
I realize that I can just cut&paste within the while condition and having the the index pass the maximum, while resulting in additional operations, doesn't falsify the return value. So if I cut everything starting from "if..." to "i++;" and paste it three (or n) times, checking the while condition would only take up 1/10 (or 1/(1+3n) of the operations, while creating a maximum of +2*3(or +(n-1)*3) unnecessary operations.
(Quick Example 2: If "number" is 1009, that would mean checking the while condition 11 times, 33 operations for the index, and 33*2 operations for the if condition. That's 100 operations, 13 less)
As I am currently experimenting with very big numbers (in layman's terms: "the condition will be false for a very, very, very long time") so pasting the if condition and the increment thousands of times would be useful, but very impractical - so my question is:
Is there a tool (or a technique I am missing) to do that for me, but keeps the code clearly and easy to modify?
Thanks in advance!
Upvotes: 1
Views: 209
Reputation: 1
Your question is a bit unclear.
First, you could change slightly your algorithm; e.g. incrementing by 2, not by 1 (since every prime number above 2 is odd).
Then, when asked to optimize (e.g. with g++ -Wall -O2
), a compiler generally do some loop unrolling, as if it "copied" (and constant-folded) a loop's body several times.
With GCC, you might help the optimizer by e.g. using __builtin_expect, e.g. with
#ifdef __GNUC__
#define MY_UNLIKELY(P) __builtin_expect((P),0)
#else
#define MY_UNLIKELY(P) (P)
#endif
and then you'll code
if(MY_UNLIKELY(number%i==0))
return 0;
At last, manually optimizing your code might not worth the pain, you should benchmark. (The CPU cache is very important today, so unrolling too much might slow down the code, see also __builtin_prefetch
in GCC and this).
You could also consider meta-programming and multi-staged programming facilities; in languages like Meta Ocaml or Common Lisp meta-programming means much more than C++11 template
things. You could consider generating C code at runtime (then dlopen
ing it), or using JIT compilation libraries like libjit
(e.g. do partial evaluation). Read also J.Pitrat's blog about meta-combinatorial search, and wikipage on eval. My MELT system shows that these techniques can (painfully) be used in relation with C++ (MELT generates C++ code at runtime so do meta-programming this way).
Upvotes: 5
Reputation: 299870
There is a weird snippet called Duff's Device which is applicable to the case where you know in advance how many iterations should occur.
So for example, let's suppose you have this loop:
for (size_t i = 0; i != max; ++i) {
call(i);
}
With Duff's Device, you can transform it to only check every 4 iterations like so:
size_t i = 0;
switch (max % 4) {
case 0: do { call(i++);
case 3: call(i++);
case 2: call(i++);
case 1: call(i++);
} while (i != max);
}
It thus combines manual unrolling with compensating for the fact that the number of iterations might not be a multiple of the number of times you unroll. There are more explanations here.
Though personally, I would advise using a slightly clearer format:
// Adjust "max" to a multiple of N:
size_t const adjusted_max = (max + N - 1) / N * N;
size_t const leftover_max = max - adjusted_max;
for (size_t i = 0; i != adjusted_max; i += N) {
call(i);
call(i);
// N times
}
switch (leftover_max) {
case N-1: call(adjusted_max + N - 1);
case N-2: call(adjusted_max + N -2);
...
case 1 : call(adjusted_max);
case 0 : ;
}
Note that you can process the leftover either before or after actually, it does not matter.
With that being said, Loop Unrolling is a basic compiler optimization; so before using such a weird implement, do check whether your compiler would not be already doing it for you...
Upvotes: 3