Why can’t I run IPU programs as non-root in Docker containers?

Question

I’m trying to run CNN training from Graphcore’s examples repo as a non-root user from Graphcore’s TensorFlow 1.5 Docker image, but it’s throwing:

2020-04-23 11:17:32.960014: I tensorflow/compiler/jit/xla_compilation_cache.cc:250] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.Saved checkpointto ./logs/RN152_bs1x16p_GN32_16.16_v1.1.11_6LT/ckpt-0
2020-04-23 11:19:07.615030: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at xla_ops.cc:361 : Unknown: [Error][Build graph] could not get temporary file for model 'MappedCodelet_%%%%%%%%%%%%%%.cpp': Permission denied
Traceback (most recent call last): 
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list,run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: [Error][Build graph] could not get temporary file for model 'MappedCodelet_%%%%%%%%%%%%%%.cpp': Permission denied
[[{{node cluster}}]]

The program works fine when I’m running it as root user, but when I create a new user it starts throwing this error. Does this mean Graphcore’s Docker images only work if you’re using root?

Marie-Anne LM · Accepted Answer

It is possible to run IPU programs as a non-root user. The reason you're seeing this behaviour is because switching user within a running Docker container (and any Ubuntu based environment) causes environment variables to be reset. These environment variables contain important IPU configuration settings required to attach to and run a program on an IPU. You can avoid this behaviour by instead doing your user management in a Dockerfile. Below is a sample snippet (where examples is a clone of https://github.com/graphcore/examples/):

FROM graphcore/tensorflow:1 
ENV LC_ALL=C.UTF-8 
ENV LANG=C.UTF-8 
RUN adduser [username]   
ADD examples examples 
RUN chown [username] -R examples

Then you can build the image with:

docker image build . -t graphcore-examples

Now you have 3 options to run the CNN training as a non-root user:

Run the CNN training directly:

gc-docker -- -ti -u [username] graphcore-examples python3 /examples/applications/tensorflow/cnns/training/train.py

Launch the container into a bash shell as the non-root user and then run the training from there:

gc-docker -- -ti -u [username] graphcore-examples 
$ python3 /examples/applications/tensorflow/cnns/training/train.py

Launch the container as root and then preserve the environment when switching user:

gc-docker -- -ti graphcore-examples 
$ su --preserve-environment - [username] 
$ python3 /examples/applications/tensorflow/cnns/training/train.py

I’d recommend using option 1 or 2 where possible. You can find more information about the gc-docker command line tool here.

Why can’t I run IPU programs as non-root in Docker containers?

Answers (1)

Related Questions