Jordan barkley
Jordan barkley

Reputation: 23

Python: Reading files in binary mode does not return expected values

Test File Contents (in Binary)

00010203 04050607 08090A0B 0C0D0E0F 
10111213 14151617 18191A1B 1C1D1E1F 
20212223 24252627 28292A2B 2C2D2E2F 
30313233 34353637 38393A3B 3C3D3E3F 
40414243 44454647 48494A4B 4C4D4E4F 
50515253 54555657 58595A5B 5C5D5E5F 
60616263 64656667 68696A6B 6C6D6E6F 
70717273 74757677 78797A7B 7C7D7E7F 
80818283 84858687 88898A8B 8C8D8E8F 
90919293 94959697 98999A9B 9C9D9E9F 
A0A1A2A3 A4A5A6A7 A8A9AAAB ACADAEAF 
B0B1B2B3 B4B5B6B7 B8B9BABB BCBDBEBF 
C0C1C2C3 C4C5C6C7 C8C9CACB CCCDCECF 
D0D1D2D3 D4D5D6D7 D8D9DADB DCDDDEDF 
E0E1E2E3 E4E5E6E7 E8E9EAEB ECEDEEEF 
F0F1F2F3 F4F5F6F7 F8F9FAFB FCFDFEFF

Test Code

#open file 1
f1 = open(test.txt, 'rb')

#declare variables
address = 0

#read a byte
while(address < 256):
    byte = f1.read(1)
    print(byte)
    address = address + 1

What is Returned

b'\x00'
b'\x01'
b'\x02'
b'\x03'
b'\x04'
b'\x05'
b'\x06'
b'\x07'
b'\x08'
b'\t'
b'\n'
b'\x0b'
b'\x0c'
b'\r'
b'\x0e'
b'\x0f'
b'\x10'
b'\x11'
b'\x12'
b'\x13'
b'\x14'
b'\x15'
b'\x16'
b'\x17'
b'\x18'
b'\x19'
b'\x1a'
b'\x1b'
b'\x1c'
b'\x1d'
b'\x1e'
b'\x1f'
b' '
b'!'
b'"'
b'#'
b'$'
b'%'
b'&'
b"'"
b'('
b')'
b'*'
b'+'
b','
b'-'
b'.'
b'/'
b'0'
b'1'
b'2'
b'3'
b'4'
b'5'
b'6'
b'7'
b'8'
b'9'
b':'
b';'
b'<'
b'='
b'>'
b'?'
b'@'
b'A'
b'B'
b'C'
b'D'
b'E'
b'F'
b'G'
b'H'
b'I'
b'J'
b'K'
b'L'
b'M'
b'N'
b'O'
b'P'
b'Q'
b'R'
b'S'
b'T'
b'U'
b'V'
b'W'
b'X'
b'Y'
b'Z'
b'['
b'\\'
b']'
b'^'
b'_'
b'`'
b'a'
b'b'
b'c'
b'd'
b'e'
b'f'
b'g'
b'h'
b'i'
b'j'
b'k'
b'l'
b'm'
b'n'
b'o'
b'p'
b'q'
b'r'
b's'
b't'
b'u'
b'v'
b'w'
b'x'
b'y'
b'z'
b'{'
b'|'
b'}'
b'~'
b'\x7f'
b'\x80'
b'\x81'
b'\x82'
b'\x83'
b'\x84'
b'\x85'
b'\x86'
b'\x87'
b'\x88'
b'\x89'
b'\x8a'
b'\x8b'
b'\x8c'
b'\x8d'
b'\x8e'
b'\x8f'
b'\x90'
b'\x91'
b'\x92'
b'\x93'
b'\x94'
b'\x95'
b'\x96'
b'\x97'
b'\x98'
b'\x99'
b'\x9a'
b'\x9b'
b'\x9c'
b'\x9d'
b'\x9e'
b'\x9f'
b'\xa0'
b'\xa1'
b'\xa2'
b'\xa3'
b'\xa4'
b'\xa5'
b'\xa6'
b'\xa7'
b'\xa8'
b'\xa9'
b'\xaa'
b'\xab'
b'\xac'
b'\xad'
b'\xae'
b'\xaf'
b'\xb0'
b'\xb1'
b'\xb2'
b'\xb3'
b'\xb4'
b'\xb5'
b'\xb6'
b'\xb7'
b'\xb8'
b'\xb9'
b'\xba'
b'\xbb'
b'\xbc'
b'\xbd'
b'\xbe'
b'\xbf'
b'\xc0'
b'\xc1'
b'\xc2'
b'\xc3'
b'\xc4'
b'\xc5'
b'\xc6'
b'\xc7'
b'\xc8'
b'\xc9'
b'\xca'
b'\xcb'
b'\xcc'
b'\xcd'
b'\xce'
b'\xcf'
b'\xd0'
b'\xd1'
b'\xd2'
b'\xd3'
b'\xd4'
b'\xd5'
b'\xd6'
b'\xd7'
b'\xd8'
b'\xd9'
b'\xda'
b'\xdb'
b'\xdc'
b'\xdd'
b'\xde'
b'\xdf'
b'\xe0'
b'\xe1'
b'\xe2'
b'\xe3'
b'\xe4'
b'\xe5'
b'\xe6'
b'\xe7'
b'\xe8'
b'\xe9'
b'\xea'
b'\xeb'
b'\xec'
b'\xed'
b'\xee'
b'\xef'
b'\xf0'
b'\xf1'
b'\xf2'
b'\xf3'
b'\xf4'
b'\xf5'
b'\xf6'
b'\xf7'
b'\xf8'
b'\xf9'
b'\xfa'
b'\xfb'
b'\xfc'
b'\xfd'
b'\xfe'
b'\xff'

Running the the code returns this. For my program to work correctly, I need the values like b'!' to be returned as b'\x20'. What can I do to accomplish this? Thank for your help!

Upvotes: 2

Views: 1350

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1125388

The byte values are correct. Python just choses to show you ASCII characters when possible, to aid debugging:

>>> bytes([0x21])
b'!'
>>> bytes([0x21])[0]
33

The actual byte value is still 33 decimal, 21 hexadecimal, but that byte maps to an ASCII character. Any printable ASCII codepoint will be displayed as such whenever you produce the representation (repr()) output for a bytes object, as that is far more readable. Certain characters (newline, carriage return) are displayed using their corresponding literal escape syntax, e.g. \n or \r, while only the remainder uses \xhh hex codes. Would you rather Python displays b'\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x0a' or b'Hello world\n' when debugging code handling bytes?

If you want to display hex values, explicitly format the byte value:

print(format(byte[0], '02x'))

to display it as a 2-digit lowercase hex, or

print(format(byte[0], '#04x'))

to include a leading 0x. Use X for uppercase.

Demo:

>>> format(bytes([0x21])[0], '02x')
'21'
>>> format(bytes([0x21])[0], '#04x')
'0x21'

If you want to display a series of bytes, you can use the binascii.hexlify() function:

>>> from binascii import hexlify
>>> hexlify(b'Hello world\n')
b'48656c6c6f20776f726c640a'
>>> print(hexlify(b'Hello world\n').decode('ASCII'), b'Hello world\n', sep='\t')
48656c6c6f20776f726c640a    b'Hello world\n'

With a bit of formatting, you can make any binary file display in both hexadecimal and ASCII representations.

Upvotes: 4

Related Questions