Rico
Rico

Reputation: 6042

UnicodeDecodeError When Opening a tar File in Python 3

I'm using Linux Mint 18.1 and Python 3.5.2.

I have a library that currently works using Python 2.7. I need to use the library for a Python 3 project. I'm updating it and have run into a unicode problem that I can't seem to fix.

First, a file is created via tar cvjf tarfile.tbz2 (on a Linux system) and is later opened in the Python library as open(tarfile).

If I run the code as is, using Python 3, I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 11: invalid start byte

My first attempt at a fix was to open it as open(tarfile, encoding='utf-8') as I was under the impression that tar would just use what the file system gave it. When I do this, I get the same error (the byte value changes).

If I try with another encoding, say latin-1, I get the following error:

TypeError: Unicode-objects must be encoded before hashing

Which leads me to believe that utf-8 is correct, but I might be misunderstanding.

Can anyone provide suggestions?

Upvotes: 2

Views: 2276

Answers (1)

Rico
Rico

Reputation: 6042

I was going down the wrong path thinking this was some strange encoding problem. When it was just a simple problem with that fact that open() defaults to read as text (r). In Python 2 it's a no-op.

The fix is to open(tarfile, 'rb').

The head fake with unicode...should have seen this one coming. :facepalm:

Upvotes: 4

Related Questions