PC Luddite
PC Luddite

Reputation: 6088

How can GCC unroll a loop if its number of iterations is unknown at compile time?

I was reading the optimization options for GCC when I found the option -funroll-all-loops.

Its description reads:

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. '-funroll-all-loops' implies the same options as '-funroll-loops'

How can the compiler unroll a loop if its number of iterations is unknown at compile time? Doesn't the compiler need this information to unroll it? What corresponding C code does it generate, and in what contexts could this be useful if it usually makes the programs run more slowly?

Upvotes: 13

Views: 3686

Answers (4)

Steve Kolokowsky
Steve Kolokowsky

Reputation: 423

Here's some C code showing how to do it:

int iterations = 100;
int unrollValue = 8;

while (iterations%unrollvalue)
{
   // insert loop code here
   iterations--;
}

while (iterations)
{
   // insert unrollValue copies of loop code here
   iterations-= unrollValue;
}

The compiler will replace the first loop with a relative jump, but that's not easy to represent in C. Note that unrolling by a power of 2 allows the compiler to use a mask instead of the (expensive) divide operation.

Upvotes: 4

Shafik Yaghmour
Shafik Yaghmour

Reputation: 158449

in what contexts could this be useful if it usually makes the programs run more slowly?

Well they are assuming if you choose this option you know what you are doing, if you don't you should not use this option.

what is gcc going to do, well I used this sample program:

#include <stdio.h>

void f(int j )
{
  for( int k = 0; k < j; ++k )
  {
    printf( "%d\n", k ) ;
  }
}

and tested it with godbolt and it generates a jump table based on the number of iterations left (see it live):

cmpl    $1, %ebp
movl    $1, %ebx
je  .L1
testl   %r12d, %r12d
je  .L27
cmpl    $1, %r12d
je  .L28
cmpl    $2, %r12d
je  .L29
cmpl    $3, %r12d
je  .L30
cmpl    $4, %r12d
je  .L31
cmpl    $5, %r12d
je  .L32
cmpl    $6, %r12d
je  .L33

Upvotes: 3

Mike Dunlavey
Mike Dunlavey

Reputation: 40649

It can do something like:

while(n >= 8){
  foo(); foo(); foo(); foo(); foo(); foo(); foo(); foo(); 
  n -= 8;
}
while(n > 0){
  foo();
  n--;
}

Of course, Duff's Device would save having to write the second loop.

Why do it? That's up to the user. If foo() spends more than a few cycles, or if the original loop takes less that 5% of overall wall-clock time, or if n is typically small, it's probably not worth the trouble.

Upvotes: 2

Toby Speight
Toby Speight

Reputation: 30709

You can't assume that there's such a thing as corresponding C code for a compiler's intermediate representation. But in this case, I'd expect the nearest equivalent to be something like Duff's Device, which is a sequence (usually in a loop) that can be entered at a computed location.

Upvotes: 0

Related Questions