Luc
Luc

Reputation: 445

Why can't I overlap asynchronous memcpy with kernel execution on fermi on win7 and CUDA 5.0?

I cannot even achieve overlapping memcpy and kernel execution with the simpleStreams example in the CUDA SDK, let alone in my own programs. These threads argue it is a problem with the WDDM driver in windows:

and suggest to:

This thread argues it is a bug in fermi:

This thread:

proposes a solution to mitigate the problems with WDDM on windows. However, it only works for a Tesla card and it requires an additional video card to steer the display, since the proposed drivers are compute-only drivers.

However, none of these threads provide a real solution. I would appreciate it, if NVIDIA could comment on this problem and come up with a solution, since apparently a lot of people are experiencing this problem.

Upvotes: 2

Views: 1303

Answers (1)

Aperture Laboratories
Aperture Laboratories

Reputation: 264

TL;DR: The issue is caused by the WDDM TDR delay option in Nsight Monitor! When set to false, the issue appears. Instead, if you set the TDR delay value to a very high number, and the "enabled" option to true, the issue goes away.

Read below for other (older) steps followed until i came to the solution above, and some other possible causes.

I just recently were able to mostly solve this problem! It is specific to windows and aero i think. Please try these steps and post your results to help others! I have tried it on GTX 650 and GT 640.

Before you do anything, consider using both onboard gpu(as display) and the discrete gpu (for computations), because there are verified issues with the nvidia driver for windows! When you use onboard gpu, said drivers don't get fully loaded, so many bugs are evaded. Also, system responsiveness is maintained while working!

  1. Make sure your concurrency problem is not related to other issues like old drivers (including bios), wrong code, incapable device, etc.
  2. Go to computer>properties
  3. Select advanced system settings on the left side
  4. Go to the Advanced tab
  5. On Performance click settings
  6. In the Visual Effects tab, select the "adjust for best performance" bullet.

This will disable aero and almost all visual effects. If this configuration works, you can try enabling one-by-one the boxes for visual effects until you find the precise one that causes problems!

Alternatively, you can:

  1. Right click on desktop, select personalize
  2. Select a theme from basic themes, that doesn't have aero.

This will also work as the above, but with more visual options enabled. For my two devices, this setting also works, so i kept it.

Please, when you try these solutions, come back here and post your findings!

For me, it solved the problem for most cases (a tiled dgemm i have made),but NOTE THAT i still can't run "simpleStreams" properly and achieve concurrency...

UPDATE: The problem is fully solved with a new windows installation!! The previous steps improved the behavior for some cases, but a fresh install solved all the problems!

I will try to find a less radical way of solving this problem, maybe restoring just the registry will be enough.

Upvotes: 2

Related Questions