Reputation: 113
I'm working with a Tesla P100 having compute capability 6.0. I'd like to find a tool to automatically get the best grid and block sizes w.r.t. my kernel code.
I recently discovered the CUDA Occupancy Calculator (the .xls spreadsheet). But I realized it's a bit outdated (the Capability was until 2.1).
I tried to search for a newer spreadsheet, including higher C.C., but nothing showed up.
So I searched for an alternative and I found that from CUDA 6.5 on, Occupancy APIs were introduced. Is this the newer alternative to the spreadsheet?
Furthermore, I found this tool from GitHub. Can I consider this as an alternative? Or is it better to use Occupancy APIs?
Also, can CUDA profilers (nvprof or Nsight) do estimations about occupancy and give some optimal block/grid size?
I'm a quite new about these tools.
Upvotes: 1
Views: 1536
Reputation: 59
There is a very complete tool that help you find best configuration Check my configuration and the graphs You want the red dots to be on the peak of each graph
You can check it out https://xmartlabs.github.io/cuda-calculator/
Upvotes: -1
Reputation: 152269
An updated version of the CUDA occupancy calculator spreadsheet ships with the CUDA toolkit, so when you install the CUDA toolkit, the excel spreadsheet is also installed on your machine. Maybe easiest just to use a file find utility for your OS to find it.
The CUDA occupancy API allows you to make the same calculations at runtime.
NVIDIA profilers offer some capability to inspect achieved occupancy. For example, nvvp can display achieved occupancy, and there is a metric for achieved occupancy which you can gather with nvprof. You may wish to simply search the profiler docs for the word "occupancy". These tools don't make estimations of optimal block and grid sizes, but they may give an indication as to whether occupancy may be a performance limiter for your application. These tools can also report the actual block and grid sizes for each kernel launch.
Upvotes: 5