Bleron Qorri
Bleron Qorri

Reputation: 33

unicode not being interpreted as unicode in Python

I've saved in a textfile a unicode string of this format b'\x1e\x80E\xd7\xd4M\x94\xa8\xb4\xf3bl[^' but when I read it from this external textfile, it gets read as a normal string.

I've tried reading the file in binary form, such as open(celesi_file_path,"rb")

fciphertext = open(ciphertext_file_path, "rb")
fkey = open(celesi_file_path,"rb")
celesi = fkey.read()
ciphertext = fciphertext.read()
ciphertext = ciphertext.decode('latin-1')
celesi = celesi.decode('latin-1')
print(type(celesi))
print(type(ciphertext))
print(celesi)
print(ciphertext)

The output is a string as: "b'\x1e\x80E\xd7\xd4M\x94\xa8\xb4\xf3bl[^'" while I am expecting it to be a string of characters which are not in this format

Upvotes: 1

Views: 75

Answers (1)

ForceBru
ForceBru

Reputation: 44886

Take a look at this:

>>> data = b'\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
>>> str(data)
"b'\\xd0\\x9f\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'"

So, if you wrote str(data) to the file, you wrote the slashes and xs, literally. You didn't write the bytes, you wrote the string representation of these bytes provided by Python. You wrote, in this example, 51 bytes (!) instead of the original 12.

You should've written the bytes themselves:

with open("data.bin", "wb") as f:
    f.write(data)

And then open this file in binary mode as well and read the bytes.

Upvotes: 1

Related Questions