Mary H
Mary H

Reputation: 409

PyTorch RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED

I'm trying to perform some inference with YOLOv8 models, simply using the following command:

yolo detect predict source=input.jpg model=yolov8n.pt device=0

But I'm getting this error related to PyTorch (my pytorch version as shown below is 2.3.0+cu118):

RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

I searched a lot, find the CUDAContext.cpp file and fix it, etc but couldn't find a solution.

What is the problem? How to fix it?

Outputs of PyTorch usage and GPU availability as shown below looks good:

Python 3.9.7 | packaged by conda-forge | (default, Sep  2 2021, 17:58:34) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.zeros(2).cuda(0)
tensor([0., 0.], device='cuda:0')
>>> print(torch.__version__)
2.3.0+cu118
>>> print(f"Is CUDA available?: {torch.cuda.is_available()}")
Is CUDA available?: True
>>> print(f"Number of CUDA devices: {torch.cuda.device_count()}")
Number of CUDA devices: 3
>>> device = torch.device('cuda')
>>> print(f"A torch tensor: {torch.rand(5).to(device)}")
A torch tensor: tensor([0.6085, 0.7618, 0.6855, 0.5276, 0.1606], device='cuda:0')

Full stack trace:

Traceback (most recent call last):
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 306, in _lazy_init
    queued_call()
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 174, in _check_capability
    capability = get_device_capability(d)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
    prop = get_device_properties(device)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 448, in get_device_properties
    return _get_device_properties(device)  # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/conda/bin/yolo", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/conda/lib/python3.9/site-packages/ultralytics/cfg/__init__.py", line 583, in entrypoint
    getattr(model, mode)(**overrides)  # default args from model
  File "/home/conda/lib/python3.9/site-packages/ultralytics/engine/model.py", line 528, in val
    validator(model=self.model)
  File "/home/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/conda/lib/python3.9/site-packages/ultralytics/engine/validator.py", line 126, in __call__
    device=select_device(self.args.device, self.args.batch),
  File "/home/conda/lib/python3.9/site-packages/ultralytics/utils/torch_utils.py", line 156, in select_device
    p = torch.cuda.get_device_properties(i)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 312, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

CUDA call was originally invoked at:

  File "/home/conda/bin/yolo", line 5, in <module>
    from ultralytics.cfg import entrypoint
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/ultralytics/__init__.py", line 5, in <module>
    from ultralytics.data.explorer.explorer import Explorer
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/ultralytics/data/__init__.py", line 3, in <module>
    from .base import BaseDataset
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/ultralytics/data/base.py", line 15, in <module>
    from torch.utils.data import Dataset
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/torch/__init__.py", line 1478, in <module>
    _C._initExtension(manager_path())
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 238, in <module>
    _lazy_call(_check_capability)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 235, in _lazy_call
    _queued_calls.append((callable, traceback.format_stack()))

Upvotes: 4

Views: 4146

Answers (4)

Reagen
Reagen

Reputation: 31

I tried all three solutions above, none works for me, but luckily i find one in csdn, as link stated, just export two variable before the real python run. it works. Like


export CUDA_VISIBLE_DEVICES=1,2
export CUDA_DEVICE_ORDER=PCI_BUS_ID
nohup python data_train.py > log/log.txt 2>&1 

Upvotes: 1

xdever
xdever

Reputation: 71

A nicer solution is to call torch.cuda.device_count.cache_clear() after manually setting CUDA_VISIBLE_DEVICES.

EDIT: what I mean by "manually setting CUDA_VISIBLE_DEVICES" is calling os.environ["CUDA_VISIBLE_DEVICES"] = ...

Upvotes: 3

K. Bogdan
K. Bogdan

Reputation: 535

Just complementing the accepted answer. This does not seem to be entirely fixed although there is a proposal already merged in pytorch (not in v2.3.1). I am not sure about all of the scenarios, but other scenarios are described in issues #10730 and #126344. I was having a similar error but with a docker usage and pytorch '2.3.1+cu121' by running a script that sets the gpu internally as in:

docker exec -it container_name python test_gpu.py --gpu_id="1"

I am in a scenario that I can not change the python3.xx\site-packages\torch\cuda\__init__.py file. Also, checking the GPU availability directly was ok (GPU was found with pytorch and I could add a tensor on it such as in the question example) and nvidia-smi was ok too listing all available devices.

From the pytorch issue discussion, this seems to be related to the device caching. One possible solution is to set the CUDA_VISIBLE_DEVICES variable before importing torch as suggested in https://github.com/pytorch/pytorch/issues/126344#issuecomment-2118159363:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ["WORLD_SIZE"] = "1"
import torch

But when you also cannot control the imports order (e.g., by using external packages), you can try to set the CUDA_VISIBLE_DEVICES when calling the script or command as in:

CUDA_VISIBLE_DEVICES=1 python test_gpu.py --gpu_id="1"

or with docker:

docker exec -it -e CUDA_VISIBLE_DEVICES=1 container_name python test_gpu.py --gpu_id="1"

Upvotes: 2

Mary H
Mary H

Reputation: 409

This was a bug in PyTorch. To solve, find and go to python3.xx\site-packages\torch\cuda\__init__.py and modify this function:

Remove or comment out this old function:

'''
@lru_cache(maxsize=1)
def device_count() -> int:
    r"""Return the number of GPUs available."""
    if not _is_compiled():
    return 0
    # bypass _device_count_nvml() if rocm (not supported)
    nvml_count = -1 if torch.version.hip else _device_count_nvml()
    return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count
'''

And replace it with this new code:

_cached_device_count: Optional[int] = None

def device_count() -> int:
    r"""Return the number of GPUs available."""
    global _cached_device_count
    if not _is_compiled():
        return 0
    if _cached_device_count is not None:
        return _cached_device_count
    # Check if using ROCm (HIP)
    if torch.version.hip:
        nvml_count = -1  # Assuming ROCm is not supported, set nvml_count to -1
    else:
        nvml_count = _device_count_nvml()  # Use NVML for NVIDIA GPUs
    r = torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count

    # NB: Do not cache the device count prior to CUDA initialization, because
    # the number of devices can change due to changes to CUDA_VISIBLE_DEVICES
    # setting prior to CUDA initialization.
    if _cached_device_count is None and _initialized:
        _cached_device_count = r
    return r

Upvotes: 8

Related Questions