Reputation: 189
I was trying to experiment with file reading in byte mode in python (with an image file). Here is my simple code:
f = open("./img.jpg", "rb")
print(f.read())
The result that printed was a massive byte string. Below is just an excerpt (not from the beginning of the string most likely since so many bytes printed that my console window won't let me go any higher than the 9d@ that you see at the beginning of the line below:
9d@\xe7\x8a\xfe\xc3<6\xc8 @9\xf9Td\x1c\x8c\x91\xff\x00\xd7\xc7\xa5\x7f\x9f\xb7\xed\xd5\xe3\xfdo\xe1\xd7\xede\xfb2x\xeb\xc3\xb8\x97\xc4?\t\xbc9}\xf1w\xc3\xb6d\xed[\xbdg\xc2?\x104\xcdr\xda\xc2F\x19"\x1b\xd6\xf0\x9cVr\x0c|\xc9p\xcb\xc8$\x0f\xef\x7f\xe1g\x8b4\x1f\x1dx?\xc2\x9e6\xf0\xb5\xfcz\xa7\x85\xbcg\xe1\xbd\x07\xc5\xfe\x18\xd4\xe2`\xd1\xea>\x1d\xf1>\x95g\xaeh7\xe8W#\x13iZ\x85\x9c\x98\xcf\x1ef99\xaf;4\x8bT\xb0\x13_\xc9$\xfd}\xa5I/\xfc\x95\xb5\xe9\x16~S\xc5\x98\x1cD3,\xc7\x1dV?\xec\xb8\xfa\xdc\x94Z\xea\xf0\xd40\xd4\xeb+\xbd\xdcy\xe9\xc9\xa5\xb2\x9c{\x9e\xbf\x1f\xf0\xfd?\x98&\xad*\x06\x07?O\xe4j\x94a\x8eq\xdb\xf4\xab\xb1\xe7\xa1\xf4\xcf\xe3\xc75\xe4\xafyZ\xfa\xee~uSM<\xc5\x11(\xe8?\x1e\xff\x00\x81\xa7\x81\x81\x8aZ*\x0e^}t
Now, I've noticed something interesting. Some of these "bytes" have non-hex characters, and some have more than 2 characters (even though 1 byte in hex only has 2 characters). For example, a valid byte would be something like \x8a (which can be seen towards the beginning of the string). However, this string also has stuff like \xc3<6 or 9d@ or \xf9Td. As can be seen in these examples, they feature characters like '@' or '<' or 'T' which aren't hex characters, and these examples are also more than 2 characters long.
How am I to interpret this? Are all of these "bytes" even really supposed to be viewed as bytes? Does this have something to do with the file format? Perhaps this is not hex after all? Can someone please help me make sense of byte strings like this?
Upvotes: 0
Views: 1333
Reputation: 27404
f.read() will return a reference to a bytes class. print will implicitly try to convert that to its string representation. Some characters will be printable and some not. Those that are not printable will be displayed in hex format
Upvotes: 0