davidA
davidA

Reputation: 13664

Python: running docker containers with sh and handling UTF-8 decoding errors

I have a Python program that is directly executed by Jenkins. This program uses the sh library to execute a docker container, via this function. Note that it is an important feature of this function that it display the subprocess's output as it executes:

def run_command(*args, **kwargs):

    # pass the parent stream 'tty' state to the command:
    tty_in = sys.stdin.isatty()
    tty_out = sys.stdout.isatty()

    run = sh.Command(args[0])
    try:
        for line in run(args[1:], _err=sys.stdout, _iter=True, _tty_in=tty_in, _tty_out=tty_out):
            sys.stdout.write(line)
            sys.stdout.flush()

As per the comments, docker run requires a TTY for input, so the keyword argument _tty_in is set to match whatever the state of stdin is. However when running under Jenkins, it is False.

However the issue is around UTF-8 encoded error messages from programs running within the container such as cp. This results in errors such as:

cp: cannot stat \xe2\x80\x98filename...

It turns out those three bytes are utf-8 encoding for a special "quote" character that cp likes to use when the locale is UTF-8. If I set the locale to "C" manually before running cp directly, I can see that it uses normal ascii instead.

When my Python script encounters these errors, it dies with the following:

Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/sh.py", line 1484, in output_thread
    done = stream.read()
  File "/usr/local/lib/python2.7/dist-packages/sh.py", line 1974, in read
    self.write_chunk(chunk)
  File "/usr/local/lib/python2.7/dist-packages/sh.py", line 1949, in write_chunk
    self.should_quit = self.process_chunk(chunk)
  File "/usr/local/lib/python2.7/dist-packages/sh.py", line 1847, in process
    handler.write(chunk)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 16: ordinal not in range(128)

This suggests to me that the sh module is expecting ascii output from the subprocess, but is receiving UTF-8 and is unable to decode it.

I found the _encoding and _decode_errors options for sh and although these do seem to affect the locale that cp sees when run directly by sh, it does not appear to translate correctly for programs running within the docker container. However it does allow the program to continue as the decoding errors are skipped rather than raising an exception.

I would prefer to understand the situation better and implement a proper solution. Can anyone explain what is actually going on here, step by step (Jenkins > Python > sh > Docker > Bash)

I'm using Python 2.7.12 with Jenkins 2.33.

Upvotes: 1

Views: 1253

Answers (1)

niwatolli3
niwatolli3

Reputation: 51

I had a similar problem when running python script in docker container.

I solved the following procedure.

(Step 1) add the following line in Dockerfile before running python command

ENV LANG C.UTF-8

Upvotes: 2

Related Questions