Suppose I have a function foo that comes in two flavors that is passed via a flag. void foo(isModeA) { // a lot of shared code for(int i = 0; i < N; i++) { for(int j = 0; j < M; j++) { if(isModeA){ // do modeA code } else { // do modeB code } } } } Unfortunately, this method is time critical and I want to avoid the conditional in the innermost loop. I have two rather unsatisfying solutions so far: create two versions of foo by duplicating the code: fooModeA() , fooModeB() and move the if logic to the function call use templating to create two versions of the function. Again move the if logic to the function call I wonder if there is a cleaner solution here. Is there a way to keep the original code but convince the compiler to create two functions for me implicitly? Or maybe a clever way to restructure the code?

Reputation: 31349

Efficiently derive two versions of a common function based on a boolean flag

Suppose I have a function foo that comes in two flavors that is passed via a flag.

void foo(isModeA) {

    // a lot of shared code

    for(int i = 0; i < N; i++) {
       for(int j = 0; j < M; j++) {
           if(isModeA){
              // do modeA code
           } else {
              // do modeB code
           }
       }
    }
}

Unfortunately, this method is time critical and I want to avoid the conditional in the innermost loop. I have two rather unsatisfying solutions so far:

create two versions of foo by duplicating the code: fooModeA(), fooModeB() and move the if logic to the function call
use templating to create two versions of the function. Again move the if logic to the function call

I wonder if there is a cleaner solution here. Is there a way to keep the original code but convince the compiler to create two functions for me implicitly? Or maybe a clever way to restructure the code?

Upvotes: 3

Answers (6)

Yakk - Adam Nevraumont

Reputation: 275730

This is c++14, but something that most compilers implemented very early in c++14 support.

void foo(bool isModeA) {

  // a lot of shared code

  auto loops = [&](auto isModeA) {
    for(int i = 0; i < N; i++) {
      for(int j = 0; j < M; j++) {
        if(isModeA){
          // do modeA code
        } else {
          // do modeB code
        }
      }
    }
  };
  if (isModeA) {
    loops( std::integral_constant<bool, true>{} );
  } else {
    loops( std::integral_constant<bool, false>{} );
  }
}

This should compile down to exactly the code you want on any real C++ compiler with optimization enabled.

If you return from within the loops things are a bit harder. A std::optional can help here (where you return an optional return type, and outside you:

  auto loops = [&](auto isModeA)->std::optional<R> {
    for(int i = 0; i < N; i++) {
      for(int j = 0; j < M; j++) {
        if(isModeA){
          // do modeA code
        } else {
          // do modeB code
        }
      }
    }
  };
  if (isModeA) {
    if (auto r = loops( std::integral_constant<bool, true>{} ))
      return *r;
  } else {
    if (auto r = loops( std::integral_constant<bool, false>{} ))
      return *r;
  }
}

or boost::optional. For void return, just return a bool from loops.

Upvotes: 1

Gem Taylor

Reputation: 5613

If you just want to push the problem "uphill" to the caller, in C++17 you could declare declare isModeA as a template specifier, and use if consexpr (isModeA), so your code still looks the same.

The caller would then have to choose to call Foo<true>() or Foo<false>(), or you could write another Foo(F isModeA) that contains that branching decision.

But I still think this is premature optimisation, at least in this simplified world of the example.

Upvotes: 0

6502

Reputation: 114559

In your example you can avoid the inner loop conditional duplicating just a couple of lines of code (the nested for loops):

void foo(isModeA) {

    // a lot of shared code

    if (isModeA()) {
       for(int i = 0; i < N; i++) {
           for(int j = 0; j < M; j++) {
              // do modeA code
           }
       }
    } else {
       for(int i = 0; i < N; i++) {
           for(int j = 0; j < M; j++) {
              // do modeB code
           }
       }
    }
}

Upvotes: 3

skypjack

Reputation: 50550

If you don't want to create two different foo functions, you can still use lambdas and invoke them in place with the right code to execute. This way the if is evaluated only once and everything is packed within foo.
As a minimal, working example:

template<typename F>
void foo(F isModeA) {
    // a lot of shared code

    [](auto f){
        for(int i = 0; i < 10; i++) {
            for(int j = 0; j < 10; j++) {
                f();
            }
        }
    }(isModeA()
        ? []() { /* code for modeA */ }
        : []() { /* code for modeB */ }
    );
}

int main() {
    foo([](){ return true; });
    foo([](){ return false; });
}

Adjust the capture lists according to your requirements.

Upvotes: 1

CocoCrisp

Reputation: 815

How do you use isModeA? Is it getting modifies into the loop? If that's not the case you can use the function pointer here if you want to preserve the code structure which will eliminate the if(isModeA) statement somewhat like

void fooModeA(){}
void fooModeB(){}

void foo(isModeA) {

// a lot of shared code

  void (*foo)() = NULL;
  if(isModeA){
    foo = fooModeA;
  }
  else{
    foo = fooModeB;
  }

  for(int i = 0; i < N; i++) {
     for(int j = 0; j < M; j++) {
        foo();       
     }
  }
}

Upvotes: 0

Bill Lynch

Reputation: 81976

Given the code:

const int N = 50;
const int M = 60;

void doModeA(int i, int j);
void doModeB(int i, int j);

void foo(bool isModeA) {
    for(int i = 0; i < N; i++) {
       for(int j = 0; j < M; j++) {
           if(isModeA){
              doModeA(i, j);
           } else {
              doModeB(i, j);
           }
       }
    }
}

Clang / LLVM will compile this into something like:

define void @_Z3foob(i1 zeroext) local_unnamed_addr #0 {
  br i1 %0, label %3, label %2

; <label>:2:                                      ; preds = %1
  br label %9

; <label>:3:                                      ; preds = %1
  br label %4

; <label>:4:                                      ; preds = %3, %4
  %5 = phi i32 [ %6, %4 ], [ 0, %3 ]
  tail call void @_Z7doModeAii(i32 %5, i32 0)
  tail call void @_Z7doModeAii(i32 %5, i32 1)
  ....
  tail call void @_Z7doModeAii(i32 %5, i32 58)
  tail call void @_Z7doModeAii(i32 %5, i32 59)
  %6 = add nuw nsw i32 %5, 1
  %7 = icmp eq i32 %6, 50
  br i1 %7, label %8, label %4

; <label>:8:                                      ; preds = %9, %4
  ret void

; <label>:9:                                      ; preds = %2, %9
  %10 = phi i32 [ %11, %9 ], [ 0, %2 ]
  tail call void @_Z7doModeBii(i32 %10, i32 0)
  tail call void @_Z7doModeBii(i32 %10, i32 1)
  ...
  tail call void @_Z7doModeBii(i32 %10, i32 58)
  tail call void @_Z7doModeBii(i32 %10, i32 59)
  %11 = add nuw nsw i32 %10, 1
  %12 = icmp eq i32 %11, 50
  br i1 %12, label %8, label %9
}

Which implements the optimization you're asking for. So just write the code that's readable, and let the compiler do its job. That's what it's there for.

Upvotes: 4

Efficiently derive two versions of a common function based on a boolean flag

Answers (6)

Related Questions