Reputation: 9090
Is there a way to get detailled information about how an OpenCL kernel was compiled on NVidia platforms (or on other platforms). Either external tools or tests that can be put into the kernel. Specifically:
Did vectorization succeed, and how are did the work items get grouped into warps?
If work items inside a work group go into different branches, did the compiler optimize it so that they still execute in parallel?
Did private memory variables get mapped to registers in the multiprocessor, or were they put into local/global memory? (Some architectures have more private memory per work group than local memory)
Can this information be seen in the PTX assembly output, or is this still higher level?
Upvotes: 0
Views: 135
Reputation: 5754
You can always just generate PTX assembly and look into that:
program.build("-cl-fast-relaxed-math");
cout << program.getInfo<CL_PROGRAM_BINARIES>()[0] << endl;
In PTX you see exactly how the compiler translated the OpenCL code. Find the PTX documentation here.
Upvotes: 0
Reputation: 1129
This is all compiler-level metadata; some of those are available through generic OpenCL API but the ones you request are way too low-level. Might be available through some Nvidia OpenCL extension though, i'm not familiar with those. Probably your best bet is finding some tools working on PTX level and feeding it the OpenCL program binaries.
Upvotes: 1