UnicodeDecodeError When Opening a tar File in Python 3

Question

I'm using Linux Mint 18.1 and Python 3.5.2.

I have a library that currently works using Python 2.7. I need to use the library for a Python 3 project. I'm updating it and have run into a unicode problem that I can't seem to fix.

First, a file is created via tar cvjf tarfile.tbz2 (on a Linux system) and is later opened in the Python library as open(tarfile).

If I run the code as is, using Python 3, I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 11: invalid start byte

My first attempt at a fix was to open it as open(tarfile, encoding='utf-8') as I was under the impression that tar would just use what the file system gave it. When I do this, I get the same error (the byte value changes).

If I try with another encoding, say latin-1, I get the following error:

TypeError: Unicode-objects must be encoded before hashing

Which leads me to believe that utf-8 is correct, but I might be misunderstanding.

Can anyone provide suggestions?

UnicodeDecodeError When Opening a tar File in Python 3

Answers (1)

Related Questions