Reputation: 131435
What do we know about the unrolling capabilities of nvcc when encountering #pragma unroll
directive? How sophisticated is it? Has anyone experimented with more and more complex loop structures to see what it gives up on?
For example,
#pragma unroll
for(int i = 0; i < constexpr_value; i++) { foo(i); }
will surely unroll (up to a rather large trip count, see this answer). What about:
#pragma unroll
for(int i = 0; i < runtime_variable_value and i < constexpr_value; i++) {
foo(i);
}
The loop trip count is not known here, but it has a constant upper bound, and complete unrolling of the loop can be performed, with some conditional jumps.
And then, what about:
template <typename T>
constexpr T simple_min(const T& x, const T& y) { return x < y ? x : y; }
#pragma unroll
for(int i = 0; i < simple_min(runtime_variable_value, constexpr_value); i++) {
foo(i);
}
which should compile to the same thing as the above?
Note: If you intend to answer "conduct your own experiments", then - I intend to do that, at least for my example, and look at the PTX if nobody knows the general answer already, in which case I'll partially-answer this question. But I would prefer something more authoritative and based on wider experience.
Upvotes: 1
Views: 1906
Reputation: 72349
The rules of unrolling are extremely simple -- if the compiler cannot deduce the loop trip count as an integral constant value, it will not automatically unroll the loop. In this case it will also emit a warning informing you of this.
If you have code with a non constant loop trip count, you may still be able to force the compiler to unroll by adding an integral constant expression with a value greater than one after the unroll pragma (i.e. #pragma unroll 8
)
All of this is extremely clearly discussed in the relevant section of the documentation.
Upvotes: 2