peluzza
peluzza

Reputation: 329

Pythonic way to hex dump files

Is there any way to code in a pythonic way this Bash command?

hexdump -e '2/1 "%02x"' file.dat

Obviously, without using os.popen, or any such shortcut ;)

It would be great if the code was functional in Python3.x

Upvotes: 8

Views: 49335

Answers (4)

杨雪念
杨雪念

Reputation: 31

You can use the following snippet:

def hexdump(data: bytes):
    def to_printable_ascii(byte):
        return chr(byte) if 32 <= byte <= 126 else "."

    offset = 0
    while offset < len(data):
        chunk = data[offset : offset + 16]
        hex_values = " ".join(f"{byte:02x}" for byte in chunk)
        ascii_values = "".join(to_printable_ascii(byte) for byte in chunk)
        print(f"{offset:08x}  {hex_values:<48}  |{ascii_values}|")
        offset += 16

Eg.

data=b''
for i in range(256):
    data += i.to_bytes(1, 'big')
hexdump(data)

Will print

00000000  00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f   |................|
00000010  10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f   |................|
00000020  20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f   | !"#$%&'()*+,-./|
00000030  30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f   |0123456789:;<=>?|
00000040  40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f   |@ABCDEFGHIJKLMNO|
00000050  50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f   |PQRSTUVWXYZ[\]^_|
00000060  60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f   |`abcdefghijklmno|
00000070  70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f   |pqrstuvwxyz{|}~.|
00000080  80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f   |................|
00000090  90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f   |................|
000000a0  a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af   |................|
000000b0  b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf   |................|
000000c0  c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf   |................|
000000d0  d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df   |................|
000000e0  e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef   |................|
000000f0  f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff   |................|

Upvotes: 3

Raymond Hettinger
Raymond Hettinger

Reputation: 226171

The standard library is your friend. Try binascii.hexlify().

Upvotes: 16

Kijewski
Kijewski

Reputation: 26022

Simply read() the whole file and encode('hex'). What could be more pythonic?

with open('file.dat', 'rb') as f:
    hex_content = f.read().encode('hex')

Upvotes: 5

abarnert
abarnert

Reputation: 365577

If you only care about Python 2.x, line.encode('hex') will encode a chunk of binary data into hex. So:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print chunk.encode('hex')

(IIRC, hexdump by default prints 32 pairs of hex per line; if not, just change that 32 to 16 or whatever it is…)

If the two-argument iter looks baffling, click the help link; it's not too complicated once you get the idea.

If you care about Python 3.x, encode only works for codecs that convert Unicode strings to bytes; any codecs that convert the other way around (or any other combination), you have to use codecs.encode to do it explicitly:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print(codecs.encode(chunk, 'hex'))

Or it may be better to use hexlify:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print(binascii.hexlify(chunk))

If you want to do something besides print them out, rather than read the whole file into memory, you probably want to make an iterator. You could just put this in a function and change that print to a yield, and that function returns exactly the iterator you want. Or use a genexpr or map call:

with open('file.dat', 'rb') as f:
    chunks = iter(lambda: f.read(32), b'')
    hexlines = map(binascii.hexlify, chunks)

Upvotes: 14

Related Questions