einpoklum
einpoklum

Reputation: 131435

What do we know about the "strength" of nvcc's #pragma unroll?

What do we know about the unrolling capabilities of nvcc when encountering #pragma unroll directive? How sophisticated is it? Has anyone experimented with more and more complex loop structures to see what it gives up on?

For example,

#pragma unroll
for(int i = 0; i < constexpr_value; i++) { foo(i); }

will surely unroll (up to a rather large trip count, see this answer). What about:

#pragma unroll
for(int i = 0;  i < runtime_variable_value and i < constexpr_value; i++) {
    foo(i); 
}

The loop trip count is not known here, but it has a constant upper bound, and complete unrolling of the loop can be performed, with some conditional jumps.

And then, what about:

template <typename T>
constexpr T simple_min(const T& x, const T& y) { return x < y ? x : y; }

#pragma unroll
for(int i = 0;  i < simple_min(runtime_variable_value, constexpr_value); i++) {      
    foo(i); 
}

which should compile to the same thing as the above?

Note: If you intend to answer "conduct your own experiments", then - I intend to do that, at least for my example, and look at the PTX if nobody knows the general answer already, in which case I'll partially-answer this question. But I would prefer something more authoritative and based on wider experience.

Upvotes: 1

Views: 1906

Answers (1)

talonmies
talonmies

Reputation: 72349

The rules of unrolling are extremely simple -- if the compiler cannot deduce the loop trip count as an integral constant value, it will not automatically unroll the loop. In this case it will also emit a warning informing you of this.

If you have code with a non constant loop trip count, you may still be able to force the compiler to unroll by adding an integral constant expression with a value greater than one after the unroll pragma (i.e. #pragma unroll 8)

All of this is extremely clearly discussed in the relevant section of the documentation.

Upvotes: 2

Related Questions