Reputation: 1468
I have a CUDA program that I am profiling on three machines:
The first machine (windows 7 workstation) is using the GeForce 690 as its primary display card (in addition to doing CUDA processing). The last 2 machines (windows laptop and linux workstation) are using other graphics cards for display rendering (integrated graphics in the case of the laptop and a lower end ATI card for the linux workstation).
I have compiled the same program (with all the CUDA profiling compiler flags set) on all three platforms, and am using nvvp to profile. The timelines of the machines #2 and #3 are what I would expect:
Windows 7 Laptop
Linux Workstation
However, the profiling timeline for the Windows Workstation is very different:
Windows 7 Workstation
I don't know how or why it happened, but the CPU and GPU computations seemed to have gotten out of sync (at least as far as the profiler is concerned). Could this have something to do with the Windows 7 workstation not having an additional graphics card dedicated to graphics?
Upvotes: 0
Views: 702
Reputation: 11529
The NVIDIA Visual Profiler, NVIDIA Nsight Visual Studio Edition, and nvprof use a common method in the driver to synchronize the GPU timers with the CPU timers. In NVIDIA Display Drivers for CUDA 5.0 and CUDA 5.5 there was a bug in the driver that affects timer synchronization with devices in SLI groups. Specifically, all devices in the SLI group used the timer from the first device which results in the other devices in the SLI group displaying event at a fixed positive or negative offset from the correct location. This issue should be fixed in GeForce R326.41 or newer driver.
Upvotes: 2