Reputation: 1
I want to know how many fp32 and int32 instructions are executed in a cuda kernel during a launch. Is there any way to profile it via Nvidia Nsight Compute?
Upvotes: 0
Views: 543
Reputation: 151799
Is there any way to profile it via Nvidia Nsight Compute?
For nsight compute, the relevant metrics are as follows:
fp32 instructions executed: smsp__sass_thread_inst_executed_op_fp32_pred_on.sum
integer instructions executed: smsp__sass_thread_inst_executed_op_integer_pred_on.sum
Example:
$ ncu --metrics smsp__sass_thread_inst_executed_op_fp32_pred_on.sum ./t2003
...
==PROF== Disconnected from process 27520
[27520] [email protected]
kernel_1(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 16
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
smsp__sass_thread_inst_executed_op_fp32_pred_on.sum inst 10,240
---------------------------------------------------------------------- --------------- ------------------------------
kernel_2(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 17
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
smsp__sass_thread_inst_executed_op_fp32_pred_on.sum inst 10,240
---------------------------------------------------------------------- --------------- ------------------------------
$
Note that recent versions of Nsight Compute are intended to be used on Volta and newer (compute capability 7.0 and higher) GPUs only.
Upvotes: 3