Reputation: 61

Sharing Raw Kernel Cache between Docker Containers

I'm creating a python application that uses cupy.RawKernal. The application runs within a docker container using NVIDIA Container Toolkit. I'd like to avoid having the cupy.RawKernal recompile every time I create a new container (happens frequently in development).

I've setup a volume mount like so:

docker run --runtime=nvidia -v ${HOME}/.cupy/kernel_cache:/home/app/.cupy/kernel_cache -d docker_image

After running the application, I see the .cubin files in their respective locations in the container and on the host. However when I recreate the container, it still takes much longer to startup the first time it is run. The .cubin files also do not get updated on the host or the container.

My first thought was a permissions issue, I've given full R/W permissions on the host folder with no effect.

Any thoughts? Thanks!

Upvotes: 2

Answers (1)

Kyle R

Reputation: 61

Turns out there is a second cache folder that needs to be shared:

~/.nv/ComputeCache

For more information, please see this link: https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/

JIT Caching

The second approach to mitigate JIT overhead is to cache the binaries generated by JIT compilation. When the device driver just-in-time compiles PTX code for an application, it automatically caches a copy of the generated binary code to avoid repeating the compilation in later invocations of the application. The cache—referred to as the compute cache—is automatically invalidated when the device driver is upgraded, so that applications can benefit from improvements in the just-in-time compiler built into the device driver.

Environment variables are available to control just-in-time compilation.

Setting CUDA_CACHE_DISABLE to 1 disables caching (no binary code is added to or retrieved from the cache).
CUDA_CACHE_MAXSIZE specifies the size of the compute cache in bytes; the default size is 256 MiB (since NVIDIA driver release 334, 32 MiB before), and the maximum size is 4 GiB; binary codes whose size exceeds the cache size are not cached; older binary codes are evicted from the cache to make room for newer binary codes if needed.
CUDA_CACHE_PATH specifies the directory location of compute cache files; the default values are:
CUDA_CACHE_PATH specifies the directory location of compute cache files; the default values are:
on Windows, %APPDATA\%NVIDIA\ComputeCache,
on MacOS, $HOME/Library/Application Support/NVIDIA/ComputeCache,
on Linux, ~/.nv/ComputeCache
Setting CUDA_FORCE_PTX_JIT to 1 forces the device driver to ignore any binary code embedded in an application (see Application Compatibility) and to just-in-time compile embedded PTX code instead. If a kernel does not have embedded PTX code, it will fail to load. You can use this environment variable to confirm that an application binary contains PTX code and that just-in-time compilation works as expected to guarantee forward compatibility with future architectures.

Upvotes: 3

Sharing Raw Kernel Cache between Docker Containers

Answers (1)

JIT Caching

Related Questions