Can I profile OpenACC kernel in C source code level?

Question

I'm trying to speed-up my code with openacc with PGI 15.7 compiler.

I want to profile my code in C source level. I'm using 'nvvp' profiler from CUDA 7.0 When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code slows. (data dependency, conditional branch and bandwidth... etc)

But, I couldn't get line-based analysis, but only 'kernel' level analysis. (e.g. main_300_gpu kernel used 10s) . So I have some trouble to know where do I have to fix the code.

Is there any way to profile my code in source-level?

I'm using

PGI 15.7 (using pgcc)

CUDA 7.0

NVIDIA GTX 960

Ubuntu 14.04 LTS x86_64

[my nvvp reporting screenshots]

Mat Colgrove · Accepted Answer

You can also try adding the flag "-ta=tesla:lineinfo" to have the compiler add source code association for the profiler (it's the same flag as nvcc --lineinfo). Though as Bob points out, the code may been heavily transformed so the line information many not directly correspond back to your original source.

Can I profile OpenACC kernel in C source code level?

Answers (2)

Related Questions