Reputation: 8068
I have the following code:
with open("heart.png", "rb") as f:
byte = f.read(1)
while byte:
byte = f.read(1)
strb = byte.decode("utf-8", "ignore")
print(strb)
When reading the bytes from "heart.png" I have to read hex bytes such as:
b'öx1a', b'öxff', b'öxa4', etc.
and also bytes in this form:
b'A', b'D', b'O', b'D', b'E', etc. <- spells ADOBE
Now for some reason when I use the above code to convert from byte to string it does not seem to work with the bytes in hex form but it works for everything else.
So when b'öx1a'
comes along it converts it to ""
(empty string)
and when b'H'
comes along it converts it to "H"
does anyone know why this is the case?
Upvotes: 4
Views: 184
Reputation: 898
There's a few things going on here.
The PNG file format can contain text chunks encoded in either Latin-1 or UTF-8. The tEXt
chunks are encoded in Latin-1 and you would need to decode them using the 'latin-1'
codec. iTXt
chunks are encoded in UTF-8 and would need to be decoded with the 'utf-8'
codec.
However, you appear to be trying to decode individual bytes, whereas characters in UTF-8 may span multiple bytes. So assuming you want to read UTF-8 strings, what you should do is read in the entire length of the string you wish to decode before attempting to decode it.
If instead you are trying to interpret binary data from the file, take a look at the struct
module which is intended for that purpose.
Upvotes: 4