Reputation: 279
I'm trying to speed-up my code with openacc with PGI 15.7 compiler.
I want to profile my code in C source level. I'm using 'nvvp' profiler from CUDA 7.0 When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code slows. (data dependency, conditional branch and bandwidth... etc)
But, I couldn't get line-based analysis, but only 'kernel' level analysis. (e.g. main_300_gpu kernel used 10s) . So I have some trouble to know where do I have to fix the code.
Is there any way to profile my code in source-level?
I'm using
PGI 15.7 (using pgcc)
CUDA 7.0
NVIDIA GTX 960
Ubuntu 14.04 LTS x86_64
[my nvvp reporting screenshots]
Upvotes: 0
Views: 308
Reputation: 5646
You can also try adding the flag "-ta=tesla:lineinfo" to have the compiler add source code association for the profiler (it's the same flag as nvcc --lineinfo). Though as Bob points out, the code may been heavily transformed so the line information many not directly correspond back to your original source.
Upvotes: 3
Reputation: 152113
At the current time (and on CUDA 7.5 or higher, with a cc5.2 or higher GPU), the nvvp profiler can associate various kinds of sampled execution activity with CUDA C/C++ lines of source code.
However, at the present time, this capability does not extend to OpenACC C/C++ (or Fortran) lines of source code.
It should still be possible to associate the activity with the disassembly, however, and it may be possible to associate with intermediate C source files produced by the PGI nollvm option. Niether of these will bear much resemblance to your OpenACC source code, however.
Another option for profiling OpenACC codes using PGI tools is to set the PGI_ACC_TIME=1 environment variable before executing your code. This will enable a lightweight profiler built into the runtime to give some analysis of the execution characteristics of your OpenACC code, in particular those parts associated with accelerator regions. The output is annotated so you can refer back to lines of your source code.
Upvotes: 1