What do we know about the "strength" of nvcc's #pragma unroll?

Question

What do we know about the unrolling capabilities of nvcc when encountering #pragma unroll directive? How sophisticated is it? Has anyone experimented with more and more complex loop structures to see what it gives up on?

For example,

#pragma unroll
for(int i = 0; i < constexpr_value; i++) { foo(i); }

will surely unroll (up to a rather large trip count, see this answer). What about:

#pragma unroll
for(int i = 0;  i < runtime_variable_value and i < constexpr_value; i++) {
    foo(i); 
}

The loop trip count is not known here, but it has a constant upper bound, and complete unrolling of the loop can be performed, with some conditional jumps.

And then, what about:

template 
constexpr T simple_min(const T& x, const T& y) { return x < y ? x : y; }

#pragma unroll
for(int i = 0;  i < simple_min(runtime_variable_value, constexpr_value); i++) {      
    foo(i); 
}

which should compile to the same thing as the above?

Note: If you intend to answer "conduct your own experiments", then - I intend to do that, at least for my example, and look at the PTX if nobody knows the general answer already, in which case I'll partially-answer this question. But I would prefer something more authoritative and based on wider experience.

What do we know about the "strength" of nvcc's #pragma unroll?

Answers (1)

Related Questions

What do we know about the &quot;strength&quot; of nvcc&#39;s #pragma unroll?

Answers (1)

Related Questions

What do we know about the "strength" of nvcc's #pragma unroll?