Reputation: 23
When I was trying to analyze the performance of a kernel, I used ncu command to generate a report. However, it didn't display the roofline analysis under the section "GPU Speed of Light Troughput". It gave me a piece of information that
The ratio of peak float (fp32) to double (fp64) performance on this device is 64:1. The kernel achieved 0% of this device's fp32 peak performance and 0% of its fp64 peak performance.
However, the kernel actually did many flops. I don't know what happened and how to fix it.
My ncu version is:
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.3.0.0 (build 33266684) (public-release)
My command line is
ncu -f --launch-count 50 --set full --export profile_rep ./my_program
I tried to find the answer through Internet but I didn't find the solution. Their ncu all seems work well. I even download others open source codes and test their kernels. The roofline analysis is still missing while others can generate that. I would be very grateful if you can help me find my mistakes when using ncu or solve this problem.
Upvotes: 1
Views: 289