soongk
soongk

Reputation: 279

Can I profile OpenACC kernel in C source code level?

I'm trying to speed-up my code with openacc with PGI 15.7 compiler.

I want to profile my code in C source level. I'm using 'nvvp' profiler from CUDA 7.0 When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code slows. (data dependency, conditional branch and bandwidth... etc)

But, I couldn't get line-based analysis, but only 'kernel' level analysis. (e.g. main_300_gpu kernel used 10s) . So I have some trouble to know where do I have to fix the code.

Is there any way to profile my code in source-level?

I'm using

PGI 15.7 (using pgcc)

CUDA 7.0

NVIDIA GTX 960

Ubuntu 14.04 LTS x86_64

[my nvvp reporting screenshots] enter image description here

enter image description here

Upvotes: 0

Views: 308

Answers (2)

Mat Colgrove
Mat Colgrove

Reputation: 5646

You can also try adding the flag "-ta=tesla:lineinfo" to have the compiler add source code association for the profiler (it's the same flag as nvcc --lineinfo). Though as Bob points out, the code may been heavily transformed so the line information many not directly correspond back to your original source.

Upvotes: 3

Robert Crovella
Robert Crovella

Reputation: 152113

At the current time (and on CUDA 7.5 or higher, with a cc5.2 or higher GPU), the nvvp profiler can associate various kinds of sampled execution activity with CUDA C/C++ lines of source code.

However, at the present time, this capability does not extend to OpenACC C/C++ (or Fortran) lines of source code.

It should still be possible to associate the activity with the disassembly, however, and it may be possible to associate with intermediate C source files produced by the PGI nollvm option. Niether of these will bear much resemblance to your OpenACC source code, however.

Another option for profiling OpenACC codes using PGI tools is to set the PGI_ACC_TIME=1 environment variable before executing your code. This will enable a lightweight profiler built into the runtime to give some analysis of the execution characteristics of your OpenACC code, in particular those parts associated with accelerator regions. The output is annotated so you can refer back to lines of your source code.

Upvotes: 1

Related Questions