OpenACC shared memory usage

Question

I am working with openacc using pgi compiler. I want to know how I can profile the code about memory usage specially the shared memory at runtime?

Thank you so much for your help!

Behzad

jefflarkin · Accepted Answer

I'm assuming you mean "shared memory" in the CUDA sense (the fast, per-SM shared memory on NVIDIA GPUs). In this case, you have a few options.

First, if you just want to know how much shared memory is being used, this can be determined at compile-time by adding -Mcuda=ptxinfo.

pgcc -fast -ta=tesla:cc35 laplace2d.c -Mcuda=ptxinfo
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'main_61_gpu' for 'sm_35'
ptxas info    : Function properties for main_61_gpu
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 26 registers, 368 bytes cmem[0]
ptxas info    : Compiling entry function 'main_65_gpu_red' for 'sm_35'
ptxas info    : Function properties for main_65_gpu_red
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 18 registers, 368 bytes cmem[0]
ptxas info    : Compiling entry function 'main_72_gpu' for 'sm_35'
ptxas info    : Function properties for main_72_gpu
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 18 registers, 344 bytes cmem[0]

In the above case, it doesn't appear that I'm using any shared memory. (Follow-up I spoke with a PGI compiler engineer and learned that the shared memory is dynamically adjusted at kernel launch, so it will not show up via ptxinfo.)

You can also use NVIDIA Visual profiler to get at this information. If you gather a GPU timeline, then click on an instance of a particular kernel, the properties panel should open and show shared memory/block. In my case, the above showed 0 bytes of shared memory used and the Visual Profiler showed some memory being used, so I'll need to dig into why.

You can get some info at runtime too. If you're comfortable on the command-line, you can use nvprof:

# Analyze load/store transactions
$ nvprof -m shared_load_transactions,shared_store_transactions ./a.out
# Analyze shared memory efficiency
# This will result in a LOT of kernel replays.
$ nvprof -m shared_efficiency ./a.out

This doesn't show the amount used, but does give you an idea of how it's used. The Visual Profiler's guided analysis will give you some insight into what these metrics mean.

OpenACC shared memory usage

Answers (1)

Related Questions