Reputation: 6042
I'm using Linux Mint 18.1 and Python 3.5.2.
I have a library that currently works using Python 2.7. I need to use the library for a Python 3 project. I'm updating it and have run into a unicode problem that I can't seem to fix.
First, a file is created via tar cvjf tarfile.tbz2
(on a Linux system) and is later opened in the Python library as open(tarfile)
.
If I run the code as is, using Python 3, I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 11: invalid start byte
My first attempt at a fix was to open it as open(tarfile, encoding='utf-8')
as I was under the impression that tar would just use what the file system gave it. When I do this, I get the same error (the byte value changes).
If I try with another encoding, say latin-1
, I get the following error:
TypeError: Unicode-objects must be encoded before hashing
Which leads me to believe that utf-8
is correct, but I might be misunderstanding.
Can anyone provide suggestions?
Upvotes: 2
Views: 2276
Reputation: 6042
I was going down the wrong path thinking this was some strange encoding problem. When it was just a simple problem with that fact that open()
defaults to read as text (r
). In Python 2 it's a no-op.
The fix is to open(tarfile, 'rb')
.
The head fake with unicode...should have seen this one coming. :facepalm:
Upvotes: 4