qwq
qwq

Reputation: 1

How to profile how many instructions are executed during a cuda kernel launch

I want to know how many fp32 and int32 instructions are executed in a cuda kernel during a launch. Is there any way to profile it via Nvidia Nsight Compute?

Upvotes: 0

Views: 543

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

Is there any way to profile it via Nvidia Nsight Compute?

For nsight compute, the relevant metrics are as follows:

fp32 instructions executed:    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum
integer instructions executed: smsp__sass_thread_inst_executed_op_integer_pred_on.sum

Example:

$ ncu --metrics smsp__sass_thread_inst_executed_op_fp32_pred_on.sum ./t2003
...
==PROF== Disconnected from process 27520
[27520] [email protected]
  kernel_1(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 16
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum                               inst                         10,240
    ---------------------------------------------------------------------- --------------- ------------------------------

  kernel_2(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 17
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum                               inst                         10,240
    ---------------------------------------------------------------------- --------------- ------------------------------

$

Note that recent versions of Nsight Compute are intended to be used on Volta and newer (compute capability 7.0 and higher) GPUs only.

Upvotes: 3

Related Questions