Dennis
Dennis

Reputation: 1065

Difference between Python2 and Python3 when bytes are written to a file

There seems to be a difference between Python2 and Python3 when bytes are written to a file.

I would like to know why Python3 suddenly starts writing bytes differently than Python2. Furthermore, what are the needed code changes to achieve the same output as with Python2.

The following Python code writes bytes to a file.

#!/usr/bin/python

badchars = (
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10"
"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
"\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30"
"\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40"
"\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50"
"\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60"
"\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70"
"\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80"
"\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90"
"\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0"
"\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0"
"\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0"
"\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0"
"\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0"
"\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0"
"\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff")

with open("output.txt", "w") as text_file:
     text_file.write(badchars)

With xxd we can see what bytes are written to the output.txt file. (generated with Python3)

└─$ xxd output.txt
00000000: 0102 0304 0506 0708 090a 0b0c 0d0e 0f10  ................
00000010: 1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ............... 
00000020: 2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
00000030: 3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
00000040: 4142 4344 4546 4748 494a 4b4c 4d4e 4f50  ABCDEFGHIJKLMNOP 
00000050: 5152 5354 5556 5758 595a 5b5c 5d5e 5f60  QRSTUVWXYZ[\]^_`
00000060: 6162 6364 6566 6768 696a 6b6c 6d6e 6f70  abcdefghijklmnop
00000070: 7172 7374 7576 7778 797a 7b7c 7d7e 7fc2  qrstuvwxyz{|}~.. <-- difference starts here
00000080: 80c2 81c2 82c2 83c2 84c2 85c2 86c2 87c2  ................
00000090: 88c2 89c2 8ac2 8bc2 8cc2 8dc2 8ec2 8fc2  ................
000000a0: 90c2 91c2 92c2 93c2 94c2 95c2 96c2 97c2  ................
000000b0: 98c2 99c2 9ac2 9bc2 9cc2 9dc2 9ec2 9fc2  ................
000000c0: a0c2 a1c2 a2c2 a3c2 a4c2 a5c2 a6c2 a7c2  ................
000000d0: a8c2 a9c2 aac2 abc2 acc2 adc2 aec2 afc2  ................
000000e0: b0c2 b1c2 b2c2 b3c2 b4c2 b5c2 b6c2 b7c2  ................
000000f0: b8c2 b9c2 bac2 bbc2 bcc2 bdc2 bec2 bfc3  ................
00000100: 80c3 81c3 82c3 83c3 84c3 85c3 86c3 87c3  ................
00000110: 88c3 89c3 8ac3 8bc3 8cc3 8dc3 8ec3 8fc3  ................
00000120: 90c3 91c3 92c3 93c3 94c3 95c3 96c3 97c3  ................
00000130: 98c3 99c3 9ac3 9bc3 9cc3 9dc3 9ec3 9fc3  ................
00000140: a0c3 a1c3 a2c3 a3c3 a4c3 a5c3 a6c3 a7c3  ................
00000150: a8c3 a9c3 aac3 abc3 acc3 adc3 aec3 afc3  ................
00000160: b0c3 b1c3 b2c3 b3c3 b4c3 b5c3 b6c3 b7c3  ................
00000170: b8c3 b9c3 bac3 bbc3 bcc3 bdc3 bec3 bf    ...............

xxd for output.txt (Python2)

└─$ xxd output.txt 
00000000: 0102 0304 0506 0708 090a 0b0c 0d0e 0f10  ................
00000010: 1112 1314 1516 1718 191a 1b1c 1d1e 1f20  ............... 
00000020: 2122 2324 2526 2728 292a 2b2c 2d2e 2f30  !"#$%&'()*+,-./0
00000030: 3132 3334 3536 3738 393a 3b3c 3d3e 3f40  123456789:;<=>?@
00000040: 4142 4344 4546 4748 494a 4b4c 4d4e 4f50  ABCDEFGHIJKLMNOP
00000050: 5152 5354 5556 5758 595a 5b5c 5d5e 5f60  QRSTUVWXYZ[\]^_`
00000060: 6162 6364 6566 6768 696a 6b6c 6d6e 6f70  abcdefghijklmnop
00000070: 7172 7374 7576 7778 797a 7b7c 7d7e 7f80  qrstuvwxyz{|}~..
00000080: 8182 8384 8586 8788 898a 8b8c 8d8e 8f90  ................
00000090: 9192 9394 9596 9798 999a 9b9c 9d9e 9fa0  ................
000000a0: a1a2 a3a4 a5a6 a7a8 a9aa abac adae afb0  ................
000000b0: b1b2 b3b4 b5b6 b7b8 b9ba bbbc bdbe bfc0  ................
000000c0: c1c2 c3c4 c5c6 c7c8 c9ca cbcc cdce cfd0  ................
000000d0: d1d2 d3d4 d5d6 d7d8 d9da dbdc ddde dfe0  ................
000000e0: e1e2 e3e4 e5e6 e7e8 e9ea ebec edee eff0  ................
000000f0: f1f2 f3f4 f5f6 f7f8 f9fa fbfc fdfe ff    ...............

Upvotes: 1

Views: 118

Answers (1)

ukBaz
ukBaz

Reputation: 7954

In Python 2, the str type was used for two different kinds of values – text and bytes, whereas in Python 3, these are separate and incompatible types.

If you want your values to be bytes then you need to put b before the quotes.

You will then get an error from the write command

TypeError: write() argument must be str, not bytes

To write bytes requires the file to be opened in binary write mode e.g. wb.

This would make your example look like this:

badchars = (
     b"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10"
     b"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
     b"\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30"
     b"\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40"
     b"\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50"
     b"\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60"
     b"\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70"
     b"\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80"
     b"\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90"
     b"\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0"
     b"\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0"
     b"\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0"
     b"\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0"
     b"\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0"
     b"\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0"
     b"\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff")


with open("/tmp/output.txt", "wb") as text_file:
    text_file.write(badchars)

I find the output from hd easier to relate back to the bytes written. Using hd you can see this is now giving the expected answer:

$ hd /tmp/output.txt 
00000000  01 02 03 04 05 06 07 08  09 0a 0b 0c 0d 0e 0f 10  |................|
00000010  11 12 13 14 15 16 17 18  19 1a 1b 1c 1d 1e 1f 20  |............... |
00000020  21 22 23 24 25 26 27 28  29 2a 2b 2c 2d 2e 2f 30  |!"#$%&'()*+,-./0|
00000030  31 32 33 34 35 36 37 38  39 3a 3b 3c 3d 3e 3f 40  |123456789:;<=>?@|
00000040  41 42 43 44 45 46 47 48  49 4a 4b 4c 4d 4e 4f 50  |ABCDEFGHIJKLMNOP|
00000050  51 52 53 54 55 56 57 58  59 5a 5b 5c 5d 5e 5f 60  |QRSTUVWXYZ[\]^_`|
00000060  61 62 63 64 65 66 67 68  69 6a 6b 6c 6d 6e 6f 70  |abcdefghijklmnop|
00000070  71 72 73 74 75 76 77 78  79 7a 7b 7c 7d 7e 7f 80  |qrstuvwxyz{|}~..|
00000080  81 82 83 84 85 86 87 88  89 8a 8b 8c 8d 8e 8f 90  |................|
00000090  91 92 93 94 95 96 97 98  99 9a 9b 9c 9d 9e 9f a0  |................|
000000a0  a1 a2 a3 a4 a5 a6 a7 a8  a9 aa ab ac ad ae af b0  |................|
000000b0  b1 b2 b3 b4 b5 b6 b7 b8  b9 ba bb bc bd be bf c0  |................|
000000c0  c1 c2 c3 c4 c5 c6 c7 c8  c9 ca cb cc cd ce cf d0  |................|
000000d0  d1 d2 d3 d4 d5 d6 d7 d8  d9 da db dc dd de df e0  |................|
000000e0  e1 e2 e3 e4 e5 e6 e7 e8  e9 ea eb ec ed ee ef f0  |................|
000000f0  f1 f2 f3 f4 f5 f6 f7 f8  f9 fa fb fc fd fe ff     |...............|
000000ff

Upvotes: 2

Related Questions