Per Persson
Per Persson

Reputation: 185

How to read binary files as hex in Python?

I want to read a file with data, coded in hex format:

01ff0aa121221aff110120...etc

the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)

I tried the following code (and other similar):

filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
  a=f.read(1)
  b=a.encode("hex")
  c.append(b)
f.close()

This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!

This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.

There might be a simple solution?

Upvotes: 11

Views: 70747

Answers (4)

Dmitry Rubanovich
Dmitry Rubanovich

Reputation: 2627

If the file is encoded in hex format, shouldn't each byte be represented by 2 characters? So

c=[]
with open('data.geno','rb') as f:
    b = f.read(2)
    while b:
        c.append(b.decode('hex'))
        b=f.read(2)

or you can even do

with open('data.geno','rb') as f:
    c = list(f.read().decode('hex'))

for example (in python 2.7.18), this works

>>> list(b'404040'.decode('hex'))
['@', '@', '@']

This won't work in Python 3. In Python you would use the codecs module:

import codecs
with open('data.geno','rb') as f:
    c = list(map(chr, codecs.decode(f.read(), 'hex')))

or (depending on whether you are looking for them as number or as characters)

import codecs
with open('data.geno','rb') as f:
    c = list(codecs.decode(f.read(), 'hex'))

because in Python 3,

>>> import codecs
>>> codecs.decode(b'404040', 'hex')
b'@@@'
>>> list(codecs.decode(b'404040', 'hex'))
[64, 64, 64]
>>> list(map(chr, codecs.decode(b'404040', 'hex')))
['@', '@', '@']

or even ''.join(map(chr, codecs.decode(f.read(), 'hex'))) if you want a string instead of a list.

>>> ''.join(map(chr, codecs.decode(b'404040', 'hex')))
'@@@'

Upvotes: 2

D-slr8
D-slr8

Reputation: 109

Just an additional note to these, make sure to add a break into your .read of the file or it will just keep going.

def HexView():
    with open(<yourfilehere>, 'rb') as in_file:
        while True:
            hexdata = in_file.read(16).hex()     # I like to read 16 bytes in then new line it.
            if len(hexdata) == 0:                # breaks loop once no more binary data is read
                break
            print(hexdata.upper())               # I also like it all in caps. 

Upvotes: 3

ShadowRanger
ShadowRanger

Reputation: 155363

Simple solution is binascii:

import binascii

# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
    # Slurp the whole file and efficiently convert it to hex all at once
    hexdata = binascii.hexlify(f.read())

This just gets you a str of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:

hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))

which will produce the list of len 2 strs corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip:

hexlist = map(''.join, zip(*[iter(hexdata)]*2))

Update:

For people on Python 3.5 and higher, bytes objects spawned a .hex() method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:

with open('data.geno', 'rb') as f:
    hexdata = f.read().hex()

Upvotes: 32

Per Persson
Per Persson

Reputation: 185

Thanks for all interesting answers!

The simple solution that worked immediately, was to change "r" to "rb", so:

f=open('data.geno','r')  # don't work
f=open('data.geno','rb')  # works fine

The code in this case is actually only two binary bites, so one byte contains four data, binary; 00, 01, 10, 11.

Yours!

Upvotes: 0

Related Questions