Reputation: 75
I’m trying to run CNN training from Graphcore’s examples repo as a non-root user from Graphcore’s TensorFlow 1.5 Docker image, but it’s throwing:
2020-04-23 11:17:32.960014: I tensorflow/compiler/jit/xla_compilation_cache.cc:250] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.Saved checkpointto ./logs/RN152_bs1x16p_GN32_16.16_v1.1.11_6LT/ckpt-0
2020-04-23 11:19:07.615030: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at xla_ops.cc:361 : Unknown: [Error][Build graph] could not get temporary file for model 'MappedCodelet_%%%%%%%%%%%%%%.cpp': Permission denied
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list,run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: [Error][Build graph] could not get temporary file for model 'MappedCodelet_%%%%%%%%%%%%%%.cpp': Permission denied
[[{{node cluster}}]]
The program works fine when I’m running it as root user, but when I create a new user it starts throwing this error. Does this mean Graphcore’s Docker images only work if you’re using root?
Upvotes: 3
Views: 122
Reputation: 176
It is possible to run IPU programs as a non-root user. The reason you're seeing this behaviour is because switching user within a running Docker container (and any Ubuntu based environment) causes environment variables to be reset. These environment variables contain important IPU configuration settings required to attach to and run a program on an IPU. You can avoid this behaviour by instead doing your user management in a Dockerfile. Below is a sample snippet (where examples
is a clone of https://github.com/graphcore/examples/):
FROM graphcore/tensorflow:1
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN adduser [username]
ADD examples examples
RUN chown [username] -R examples
Then you can build the image with:
docker image build . -t graphcore-examples
Now you have 3 options to run the CNN training as a non-root user:
gc-docker -- -ti -u [username] graphcore-examples python3 /examples/applications/tensorflow/cnns/training/train.py
gc-docker -- -ti -u [username] graphcore-examples
$ python3 /examples/applications/tensorflow/cnns/training/train.py
gc-docker -- -ti graphcore-examples
$ su --preserve-environment - [username]
$ python3 /examples/applications/tensorflow/cnns/training/train.py
I’d recommend using option 1 or 2 where possible. You can find more information about the gc-docker
command line tool here.
Upvotes: 4