Greg Peckory
Greg Peckory

Reputation: 8068

Python converting bytes to string

I have the following code:

with open("heart.png", "rb") as f:

    byte = f.read(1)

    while byte:

        byte = f.read(1)

        strb = byte.decode("utf-8", "ignore")

        print(strb)

When reading the bytes from "heart.png" I have to read hex bytes such as:

b'öx1a', b'öxff', b'öxa4', etc.

and also bytes in this form:

b'A', b'D', b'O', b'D', b'E', etc.    <- spells ADOBE

Now for some reason when I use the above code to convert from byte to string it does not seem to work with the bytes in hex form but it works for everything else.

So when b'öx1a' comes along it converts it to "" (empty string)

and when b'H' comes along it converts it to "H"

does anyone know why this is the case?

Upvotes: 4

Views: 184

Answers (1)

Kevin S
Kevin S

Reputation: 898

There's a few things going on here.

The PNG file format can contain text chunks encoded in either Latin-1 or UTF-8. The tEXt chunks are encoded in Latin-1 and you would need to decode them using the 'latin-1' codec. iTXt chunks are encoded in UTF-8 and would need to be decoded with the 'utf-8' codec.

However, you appear to be trying to decode individual bytes, whereas characters in UTF-8 may span multiple bytes. So assuming you want to read UTF-8 strings, what you should do is read in the entire length of the string you wish to decode before attempting to decode it.

If instead you are trying to interpret binary data from the file, take a look at the struct module which is intended for that purpose.

Upvotes: 4

Related Questions