D.W.
D.W.

Reputation: 3615

pip UnicodeDecodeError: 'utf8' codec can't decode byte

I run pip, and I always get the following error, no matter what flags I pass to pip:

$ pip --version
Traceback (most recent call last):
    [...irrelevant details omitted...]
      File "/usr/lib64/python2.7/codecs.py", line 314, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 203: invalid start byte

What's going on? How do I fix this?

I have pip version 8.0.2 installed. Changing or clearing the LANG and LC_ALL environment variables doesn't help. I must have read a dozen other questions here but I'm struggling to find anything that provides a clear indication of what the problem is or how to fix it.

Upvotes: 5

Views: 14127

Answers (1)

D.W.
D.W.

Reputation: 3615

What's going on

pip is buggy. It can crashes if any installed Python system library has a non-ASCII character in the library description.

The relevant part of the error message is this:

UnicodeDecodeError: 'utf8' codec can't decode byte ...

The specific byte value isn't important. The crash is triggered by a certain kind of non-ASCII character in a .egg-info file in the system directory where Python packages are stored (e.g., /usr/lib/python2.7/site-packages). pip tries to parse all of those files, and when it encounters certain non-ASCII characters, it falls over and dies.

How to fix the problem

There are two options:

  1. The right fix: Update pip to the latest version. This fixes the bug.

  2. The kludge: Remove the offending Python library package that triggers pip to crash. That requires you to figure out which Python library package is responsible; unfortunately, pip doesn't give you much help figuring that out, so you're going to have to do some sleuthing to figure that out -- see the next section.

Obviously, the former option is preferable... but if for some reason you cannot upgrade pip, I'll describe how to follow the second approach.

How to find the Python library that's responsible for this

Here's how you can check your Python system packages and narrow down which one might be responsible for pip's crashes. We're going to look for any *.egg-info file in the Python site-packages/ directory that has a non-ASCII character. Try this:

cd /usr/lib/python2.7/site-packages
LANG=ascii grep -P  '[[:^ascii:]]' *.egg-info 2>/dev/null

(This requires GNU grep.) Look through the matches it finds. Check each file to see if it contains a non-ASCII character that matches the error message.

In my case, the error message mentioned can't decode byte 0xf6, so we're going to look for the file that contains a 0xf6 byte. We can inspect each matching file using a hex dump utility; I like to use hexdump -C.

To find the match, you might need to check other locations for Python packages, e.g., /usr/lib64/python2.7/site-packages, /usr/local/lib/python2.7/site-packages, and so on.

Once you find the Python package that is causing the problems, you can try removing that library (if it is not an essential package).

Other possible explanations and troubleshooting steps

On older systems, this error could also be triggered if your current path or your username contains any non-ASCII characters.

Some people reported that they had success by clearing the LC_ALL and LANG environment variables, or by setting them to different settings, such as export LC_ALL="en_US.UTF-8" LANG="en_US.UTF-8". This didn't help for me, though.

Other reading

References that I found helpful:

Upvotes: 12

Related Questions