Reputation: 60451
Consider the following example :
template<int X> class MyClass
{
public:
MyClass(int x) {_ncx = x;}
void test()
{
for (unsigned int i = 0; i < 1000000; ++i) {
if ((X < 0) ? (_cx > 5) : (_ncx > 5)) {
/* SOMETHING */
} else {
/* SOMETHING */
}
}
}
protected:
static const int _cx = (X < 0) ? (-X) : (X);
int _ncx;
};
My question is : will MyClass<-6>::test() and MyClass<6>::test() have a different speed ?
I hope so because in case of a negative template parameter, the if
in test function can be evaluated at compile-time, but I'm not sure what is the behaviour of a compiler if there is a compile-time thing and a non-compile-time thing in a ternary operator (which is the case here).
Note : it's a pure "theoretical" question. If there is a non-null probability of "yes", I will implement some class for my code with such compile-time template parameters, and if not, I will only provide runtime versions.
Upvotes: 2
Views: 793
Reputation: 73570
For my compiler ( clang++ v2.9 on OS X ) compiling this similar but not identical code:
void foo();
void bar();
template<int N>
void do_something( int arg ) {
if ( N<0 && arg<0 ) { foo(); }
else { bar(); }
}
// Some functions to instantiate the templates.
void one_fn(int arg) {
do_something<1>(arg);
}
void neg_one_fn(int arg) {
do_something<-1>(arg);
}
This generates the following assembly with clang++ -S -O3
.
The first functions assembly clearly only has the call to bar
.
.globl __Z6one_fni
.align 4, 0x90
__Z6one_fni: ## @_Z6one_fni
Leh_func_begin0:
pushl %ebp
movl %esp, %ebp
popl %ebp
jmp __Z3barv ## TAILCALL
Leh_func_end0:
The second function has been reduced to a simple if to call either bar
or foo
.
.globl __Z10neg_one_fni
.align 4, 0x90
__Z10neg_one_fni: ## @_Z10neg_one_fni
Leh_func_begin1:
pushl %ebp
movl %esp, %ebp
cmpl $0, 8(%ebp)
jns LBB1_2 ## %if.else.i
popl %ebp
jmp __Z3foov ## TAILCALL
LBB1_2: ## %if.else.i
popl %ebp
jmp __Z3barv ## TAILCALL
Leh_func_end1:
So you can see that the compiler inlined the template, then optimised away the branch when it could. So the kind of transformation you are hoping for does occur in current compilers. I got similar results (but less clear assembly) from an old g++ 4.0.1 compiler too.
I decided this example wasn't quite similar enough to your initial case (as it didnt' involve the ternary operator) so I changed it to this: (Getting the same kind of results)
template<int X>
void do_something_else( int _ncx ) {
static const int _cx = (X<0) ? (-X) : (X);
if ( (X < 0) ? (_cx > 5) : (_ncx > 5)) {
foo();
} else {
bar();
}
}
void a(int arg) {
do_something_else<1>(arg);
}
void b(int arg) {
do_something_else<-1>(arg);
}
This generates the assembly
This still contains the branch.
__Z1ai: ## @_Z1ai
Leh_func_begin2:
pushl %ebp
movl %esp, %ebp
cmpl $6, 8(%ebp)
jl LBB2_2 ## %if.then.i
popl %ebp
jmp __Z3foov ## TAILCALL
LBB2_2: ## %if.else.i
popl %ebp
jmp __Z3barv ## TAILCALL
Leh_func_end2:
Branch is optimised away.
__Z1bi: ## @_Z1bi
Leh_func_begin3:
pushl %ebp
movl %esp, %ebp
popl %ebp
jmp __Z3barv ## TAILCALL
Leh_func_end3:
Upvotes: 2
Reputation: 4527
Move the conditional outside the loop:
...
if ((X < 0) ? (_cx > 5) : (_ncx > 5)) {
for (unsigned int i = 0; i < 1000000; ++i) {
/* SOMETHING */
}
} else {
for (unsigned int i = 0; i < 1000000; ++i) {
/* SOMETHING */
}
}
...
That way you don't depend on the compiler optimization to remove unused code; if the unused part of the conditional is not removed by the compiler you just pay for a conditional branch once, and not every time around the loop.
Upvotes: 2
Reputation: 4467
It probably depends on how smart your compiler is. I recommend you write a little benchmark program to test it out yourself in your environment to find out for sure.
Upvotes: 0