Reputation: 31349
Suppose I have a function foo
that comes in two flavors that is passed via a flag.
void foo(isModeA) {
// a lot of shared code
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
if(isModeA){
// do modeA code
} else {
// do modeB code
}
}
}
}
Unfortunately, this method is time critical and I want to avoid the conditional in the innermost loop. I have two rather unsatisfying solutions so far:
foo
by duplicating the code: fooModeA()
, fooModeB()
and move the if logic to the function callI wonder if there is a cleaner solution here. Is there a way to keep the original code but convince the compiler to create two functions for me implicitly? Or maybe a clever way to restructure the code?
Upvotes: 3
Views: 199
Reputation: 275730
This is c++14, but something that most compilers implemented very early in c++14 support.
void foo(bool isModeA) {
// a lot of shared code
auto loops = [&](auto isModeA) {
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
if(isModeA){
// do modeA code
} else {
// do modeB code
}
}
}
};
if (isModeA) {
loops( std::integral_constant<bool, true>{} );
} else {
loops( std::integral_constant<bool, false>{} );
}
}
This should compile down to exactly the code you want on any real C++ compiler with optimization enabled.
If you return from within the loops
things are a bit harder. A std::optional
can help here (where you return an optional return type, and outside you:
auto loops = [&](auto isModeA)->std::optional<R> {
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
if(isModeA){
// do modeA code
} else {
// do modeB code
}
}
}
};
if (isModeA) {
if (auto r = loops( std::integral_constant<bool, true>{} ))
return *r;
} else {
if (auto r = loops( std::integral_constant<bool, false>{} ))
return *r;
}
}
or boost::optional
. For void
return, just return a bool
from loops
.
Upvotes: 1
Reputation: 5613
If you just want to push the problem "uphill" to the caller, in C++17 you could declare declare isModeA as a template specifier, and use if consexpr (isModeA), so your code still looks the same.
The caller would then have to choose to call Foo<true>()
or Foo<false>()
,
or you could write another Foo(F isModeA)
that contains that branching decision.
But I still think this is premature optimisation, at least in this simplified world of the example.
Upvotes: 0
Reputation: 114559
In your example you can avoid the inner loop conditional duplicating just a couple of lines of code (the nested for
loops):
void foo(isModeA) {
// a lot of shared code
if (isModeA()) {
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
// do modeA code
}
}
} else {
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
// do modeB code
}
}
}
}
Upvotes: 3
Reputation: 50550
If you don't want to create two different foo
functions, you can still use lambdas and invoke them in place with the right code to execute. This way the if
is evaluated only once and everything is packed within foo
.
As a minimal, working example:
template<typename F>
void foo(F isModeA) {
// a lot of shared code
[](auto f){
for(int i = 0; i < 10; i++) {
for(int j = 0; j < 10; j++) {
f();
}
}
}(isModeA()
? []() { /* code for modeA */ }
: []() { /* code for modeB */ }
);
}
int main() {
foo([](){ return true; });
foo([](){ return false; });
}
Adjust the capture lists according to your requirements.
Upvotes: 1
Reputation: 815
How do you use isModeA
? Is it getting modifies into the loop?
If that's not the case you can use the function pointer here if you want to preserve the code structure which will eliminate the if(isModeA)
statement somewhat like
void fooModeA(){}
void fooModeB(){}
void foo(isModeA) {
// a lot of shared code
void (*foo)() = NULL;
if(isModeA){
foo = fooModeA;
}
else{
foo = fooModeB;
}
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
foo();
}
}
}
Upvotes: 0
Reputation: 81976
Given the code:
const int N = 50;
const int M = 60;
void doModeA(int i, int j);
void doModeB(int i, int j);
void foo(bool isModeA) {
for(int i = 0; i < N; i++) {
for(int j = 0; j < M; j++) {
if(isModeA){
doModeA(i, j);
} else {
doModeB(i, j);
}
}
}
}
Clang / LLVM will compile this into something like:
define void @_Z3foob(i1 zeroext) local_unnamed_addr #0 {
br i1 %0, label %3, label %2
; <label>:2: ; preds = %1
br label %9
; <label>:3: ; preds = %1
br label %4
; <label>:4: ; preds = %3, %4
%5 = phi i32 [ %6, %4 ], [ 0, %3 ]
tail call void @_Z7doModeAii(i32 %5, i32 0)
tail call void @_Z7doModeAii(i32 %5, i32 1)
....
tail call void @_Z7doModeAii(i32 %5, i32 58)
tail call void @_Z7doModeAii(i32 %5, i32 59)
%6 = add nuw nsw i32 %5, 1
%7 = icmp eq i32 %6, 50
br i1 %7, label %8, label %4
; <label>:8: ; preds = %9, %4
ret void
; <label>:9: ; preds = %2, %9
%10 = phi i32 [ %11, %9 ], [ 0, %2 ]
tail call void @_Z7doModeBii(i32 %10, i32 0)
tail call void @_Z7doModeBii(i32 %10, i32 1)
...
tail call void @_Z7doModeBii(i32 %10, i32 58)
tail call void @_Z7doModeBii(i32 %10, i32 59)
%11 = add nuw nsw i32 %10, 1
%12 = icmp eq i32 %11, 50
br i1 %12, label %8, label %9
}
Which implements the optimization you're asking for. So just write the code that's readable, and let the compiler do its job. That's what it's there for.
Upvotes: 4