sav
sav

Reputation: 2150

How do I use OpenCL in a docker container

I have successfully used OpenCL on my local windows PC and I would now like to get my program working in a container

First attempt

FROM ubuntu:latest

RUN apt-get update 

#Done to make install non-interactive
RUN export DEBIAN_FRONTEND=noninteractive
RUN ln -fs /usr/share/zoneinfo/Australia/Brisbane /etc/localtime
RUN apt-get install -y tzdata
RUN dpkg-reconfigure --frontend noninteractive tzdata

#Install packages that seem like they might help
RUN apt-get upgrade -y
RUN apt-get install -y --no-install-recommends build-essential clinfo cmake gcc make nvidia-modprobe ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-headers virtualenv wget 
RUN apt-get install -y --no-install-recommends libnvidia-compute-565-server 
RUN apt install python3.12-venv -y 

#Add NVidia Package Repo
RUN rm -rf /var/lib/apt/lists/*
RUN rm -f /etc/apt/sources.list.d/cuda.list
RUN echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 /" > /etc/apt/sources.list.d/cuda.list
RUN apt-get update
RUN apt-get upgrade -y

#Make python virtual environment
WORKDIR /
RUN mkdir /venv
ENV VIRTUAL_ENV=/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install --upgrade pip
RUN pip3 install numpy conan siphash24 pyopencl

Building it

docker build . -t test_ocl 
docker run -it --gpus all test_ocl /bin/bash 

clinfo output:

root@7307c8f6cf60:/# clinfo
Number of platforms                               0

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

nvidia-smi

root@7307c8f6cf60:/# nvidia-smi
Thu Feb  6 16:23:01 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.09              Driver Version: 571.96         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A1000 6GB Lap...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   39C    P8              5W /   35W |     194MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Looking for the platforms using python

root@7307c8f6cf60:/# /venv/bin/python
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyopencl as cl
>>> cl.get_platforms()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

Second Attempt

docker run --rm --gpus all nvidia/opencl clinfo
Number of platforms                               0

Third Attempt

Using ROCm

docker run -it --gpus all rocm/dev-ubuntu-22.04 /bin/bash

clinfo

root@f561a9533509:/# clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3635.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               0

At least this found my CPU this time

apt update 
apt upgrade -y
pip3 install siphash24 pyopencl
root@f561a9533509:/# python3
Python 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyopencl as cl
>>> cl.get_platforms()
[<pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7fa26e577010>]

it found the platform

>>> p = cl.get_platforms()[0]
>>> p.get_devices()
[]
>>>

But no device


Fourth Attempt

 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6

> Compute 8.6 CUDA device: [NVIDIA RTX A1000 6GB Laptop GPU]
20480 bodies, total time for 10 iterations: 19.150 ms
= 219.026 billion interactions per second
= 4380.514 single-precision GFLOP/s at 20 flops per interaction

Perhaps this successfully sent commands to the GPU? Although I'm not sure how I can replicate that in my own code yet?

Upvotes: -1

Views: 137

Answers (3)

sav
sav

Reputation: 2150

If anyone else want to know what worked for me see

https://github.com/umarwaseeem/OpenCL-and-Docker

I was missing pocl-opencl-icd

FROM ubuntu:24.04

RUN apt-get update && \
    apt-get install -y pocl-opencl-icd ocl-icd-opencl-dev gcc clinfo

WORKDIR /app

COPY host.c /app/
COPY Makefile /app/

Upvotes: 0

sav
sav

Reputation: 2150

Partial answer

Consider this example

Dockerfile

FROM tensorflow/tensorflow:latest-gpu

RUN apt-get update && apt-get install -y python3-pip

COPY . /app
WORKDIR /app

RUN pip3 install tensorflow[and-cuda]

CMD ["python3", "example.py"]

example.py

import tensorflow as tf

# Check if GPU is available
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

# Perform a simple computation using the GPU
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
c = tf.matmul(a, b)

print("Result of matrix multiplication:\n", c)

# Define tensors
d = tf.constant([1.0, 2.0, 3.0, 4.0])
e = tf.constant([0.5, 0.5, 0.5, 0.5])


# Perform element-wise addition on GPU
print("Addition")
with tf.device('/GPU:0'):
    f = d + e

print(f)

running it:

docker build -t tensorflow-gpu-example .

docker run --gpus all -it tensorflow-gpu-example

2025-02-11 07:00:30.848469: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1739257230.863116       1 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739257230.867928       1 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-11 07:00:30.883083: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Num GPUs Available:  1
I0000 00:00:1739257233.184817       1 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3620 MB memory:  -> device: 0, name: NVIDIA RTX A1000 6GB Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Result of matrix multiplication:
 tf.Tensor(
[[1. 3.]
 [3. 7.]], shape=(2, 2), dtype=float32)
Addition
tf.Tensor([1.5 2.5 3.5 4.5], shape=(4,), dtype=float32)

This also seems to demonstrate that the GPU can be used in docker, I'm just having trouble using OpenCL.

Upvotes: 0

sav
sav

Reputation: 2150

I haven't quite got OpenCL working but it seems I have managed to use the GPU inside docker.

src/gpu.py

from numba import cuda
import numpy as np

@cuda.jit
def vector_add(a, b, c):
    idx = cuda.grid(1)
    if idx < a.size:
        c[idx] = a[idx] + b[idx]

def main():
    # Size of vectors
    n = 1000000

    # Initialize host vectors
    h_a = np.random.rand(n).astype(np.float32)
    h_b = np.random.rand(n).astype(np.float32)
    h_c = np.zeros(n, dtype=np.float32)

    # Allocate device memory
    d_a = cuda.to_device(h_a)
    d_b = cuda.to_device(h_b)
    d_c = cuda.device_array(n, dtype=np.float32)

    # Configure the blocks and grids
    threads_per_block = 256
    blocks_per_grid = (n + threads_per_block - 1) // threads_per_block

    # Launch the kernel
    vector_add[blocks_per_grid, threads_per_block](d_a, d_b, d_c)

    # Copy the result back to the host
    h_c = d_c.copy_to_host()

    # Verify the result
    if np.allclose(h_c, h_a + h_b):
        print("Success!")
    else:
        print("Error!")

if __name__ == "__main__":
    main()

Dockefile

FROM ubuntu:24.04

RUN apt update
RUN apt upgrade -y

RUN apt install -y curl wget 

ENV INSTALLER_URL="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh"
ENV INSTALLER_FILE="Miniconda3-latest-Linux-x86_64.sh"
ENV INSTALL_DIR="/miniconda3"

WORKDIR /home/

RUN curl -o $INSTALLER_FILE $INSTALLER_URL
RUN chmod 777 $INSTALLER_FILE

RUN ./$INSTALLER_FILE -b -p $INSTALL_DIR

ENV PATH=$INSTALL_DIR/bin:$PATH

RUN conda config --set ssl_verify false
RUN pip config set global.trusted-host "pypi.org files.pythonhosted.org pypi.python.org"
RUN conda config --append channels conda-forge

RUN conda install -y numba
RUN conda install -y cudatoolkit

COPY src/gpu.py /home/gpu.py

Building it:

docker build . -t numba_test 

Running it:

docker run --rm -it --gpus=all numba_test python /home/gpu.py
Success!

Upvotes: 0

Related Questions