Severus Snape
Severus Snape

Reputation: 23

How to generate a roofline analysis by Nsight?

When I was trying to analyze the performance of a kernel, I used ncu command to generate a report. However, it didn't display the roofline analysis under the section "GPU Speed of Light Troughput". It gave me a piece of information that

The ratio of peak float (fp32) to double (fp64) performance on this device is 64:1. The kernel achieved 0% of this device's fp32 peak performance and 0% of its fp64 peak performance.

However, the kernel actually did many flops. I don't know what happened and how to fix it.
My ncu version is:

NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.3.0.0 (build 33266684) (public-release)

My command line is

ncu -f --launch-count 50 --set full --export profile_rep ./my_program

I tried to find the answer through Internet but I didn't find the solution. Their ncu all seems work well. I even download others open source codes and test their kernels. The roofline analysis is still missing while others can generate that. I would be very grateful if you can help me find my mistakes when using ncu or solve this problem.

Upvotes: 1

Views: 289

Answers (0)

Related Questions