Reputation: 107
I'm profiling my CUDA application, and I've come across something that I don't understand about the "Control Flow Divergence" metric that is present in the Visual Profiler.
According to the User Guide:
Control flow divergence gives the percentage of thread instructions that were not executed by all threads in the warp, hence causing divergence.
I've got the following code in my CUDA kernel:
int var;
var = tex2D(texture, x, y); // texture fetch
if(var < 0) {
var *= -1;
results[(blockIdx.x*blockDim.x) + threadIdx.x] = var; // global memory array
}
Here's what happens: not a single thread enters the branch (I checked the values in global memory), but the profiler states that control flow divergence is 34%. If on that same branch I insert a printf, then the value jumps to 43% (and oddly the execution time increases as well), despite nothing happening on stdout. Does this mean that the metric takes into account all of the kernel's instructions, even the ones not executed by any thread? (so effectively not having warp divergence)
On both cases the Divergent Branches metric is 0%.
Upvotes: 0
Views: 707
Reputation: 21108
What version are you using? It sounds like you're using an old version so it may be worth updating to a more recent version (e.g. 4.2 or 5.0 - the latter is currently a release candidate).
If you're able to update to the CUDA 5.0 the Visual Profiler then you by analysing the specific kernel you can have the profiler highlight the specific lines in your kernel that are suffering from divergence (same for non-coalesced memory accesses). You'll need to compile your code with either debug (-G) or, if you want to profile release code, with line info (-lineinfo) for this to work.
Upvotes: 1