Roger Dahl
Roger Dahl

Reputation: 15734

Resetting GPU and driver after CUDA error

Sometimes, bugs in my CUDA programs cause the desktop graphics to break (in Windows). Typically, the screen remains somewhat readable, but when graphics change, such as when dragging a window, lots of semi-random colored pixels and small blocks appear.

I have tried to reset the GPU and driver by changing the desktop resolution, but that doesn't help. The only fix I have found is to reboot the computer.

Is there a program out there or some trick I can use to get the driver and GPU to reset without rebooting?

Upvotes: 28

Views: 83622

Answers (7)

fuzzyTew
fuzzyTew

Reputation: 3768

This happens to me on linux when I hibernate. I created this script to resolve the problem. On resume from hibernation, it terminates all processes using nvidia_uvm and then reloads the module (per fraank).

I have this in /usr/lib/systemd/system-sleep/fix-nvidia-uvm:

#!/usr/bin/env bash
case "$2" in
hibernate)
    case "$1" in
    post)
        echo "$0 $@: terminate processes using nvidia_uvm"
        fuser --kill /dev/nvidia-uvm
        while fuser --silent /dev/nvidia-uvm; do sleep 1; done
        echo "$0 $@: reload nvidia_uvm"
        modprobe -r nvidia_uvm && modprobe nvidia_uvm
    ;;
    esac
;;
esac
$ sudo chmod 755 /usr/lib/systemd/system-sleep/fix-nvidia-uvm

Upvotes: 2

debrises
debrises

Reputation: 105

  1. ps -ef
  2. find something like root 4066644 1 99 08:56 ? 04:32:25 /opt/conda/bin/python /data/
  3. kill -9 4066644

Upvotes: 1

fraank
fraank

Reputation: 799

Because the same problem occurs sometimes on unix and google forwarded me to this thread, I hope this helps somebody else..

On ubuntu unloading and reloading the nvidia kernel module solved the problem for me:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

Upvotes: 39

Ava Assadi
Ava Assadi

Reputation: 1

  1. from "device manager", under Display adapters tab, find the driver
  2. disable it
  3. press win + ctrl +shift + B (monitor will blink)
  4. enable the driver

there you go.

Upvotes: 0

Matija Grcic
Matija Grcic

Reputation: 13381

To reset the graphics stack in Windows, press Win+Ctrl+Shift+B.

Upvotes: 7

harrism
harrism

Reputation: 27809

Edit:

If you are on Tesla hardware on Linux and can run nvidia-smi, then you can reset the GPU using

nvidia-smi -r

or

nvidia-smi --gpu-reset

Here is the man output for this switch:

Resets GPU state. Can be used to clear double bit ECC errors or recover hung GPU. Requires -i switch to target specific device. Available on Linux only.

Otherwise...


The way to truly reset the hardware is to reboot.

What you describe shouldn't happen. I recommend testing with different hardware and let us know if it still occurs.

Upvotes: 18

jorge
jorge

Reputation: 21

I have a GeForce GTX 260 over NVDIA GPU SDK 4.2 and I am experiencing the some problems. Sometimes developing I have bugs in the programs. This causes the screen to show the random colored pixels described in this post.

As stated here, if I change resolution they do not disappear. Moreover, if I only change the COLOUR DEPTH from 32 to 16 bits, the random colored pixels disappear, but going back to 32 bits (without rebooting) make them appear again. Last bug that caused this behaviour was using __constant__ memory but passing it as a pointer:

test<<<grid, threadsPerBlock>>>( cuda_malloc_data, cuda_constant_data );

If I do not pass cudb_constant_data, then there is no bug (and consequently, the random coloured pixels do not appear).

Upvotes: 2

Related Questions