user1225343
user1225343

Reputation: 85

python literal binary to hex conversion

I have a textfile containing a range of bits, in ascii:

cat myFile.txt
0101111011100011001...

I would like to write this to an other file in binary mode, so that i can read it in an hexeditor. How could I reach that? I tried already to convert it with code like:

f2=open(fileOut, 'wb')
    with open(fileIn) as f:
      while True:
            c = f.read(1)
            byte = byte+str(c)
            if not c:
                print "End of file"
                break
            if count % 8 is 0:
                count = 0 
                print hex(int(byte,2))
                try:
                    f2.write('\\x'+hex(int(byte,2))[2:]).zfill(2)
                except:
                     pass
                byte = ''
            count += 1

but that didn't achieve what I planed to do. Do you have any hint?

Upvotes: 2

Views: 634

Answers (2)

unutbu
unutbu

Reputation: 880687

  • Reading and writing one byte at a time is painfully slow. You may get around ~45x speedup of your code simply by reading more data from the file per call to f.read and f.write:

    |------------------+--------------------|
    | using_loop_20480 | 8.34 msec per loop | 
    | using_loop_8     | 354 msec per loop  |
    |------------------+--------------------|
    

    using_loop is the code shown at the bottom of this post. using_loop_20480 is the code with chunksize = 1024*20. This means that 20480 bytes are read from the file at a time. using_loop_1 is the same code with chunksize = 1.

  • Regarding count % 8 is 0: Don't use is to compare numerical values; use == instead. Here are some examples why is may give you wrong results (maybe not in the code you posted, but in general, is is not appropriate here):

    In [5]: 1L is 1
    Out[5]: False
    
    In [6]: 1L == 1
    Out[6]: True
    
    In [7]: 0.0 is 0
    Out[7]: False
    
    In [8]: 0.0 == 0
    Out[8]: True
    
  • Instead of

    struct.pack('{n}B'.format(n = len(bytes)), *bytes)
    

    you could use

    bytearray(bytes)
    

    Not only is it less typing, it is a slight bit faster too.

    |------------------------------+--------------------|
    |             using_loop_20480 | 8.34 msec per loop |
    | using_loop_with_struct_20480 | 8.59 msec per loop |
    |------------------------------+--------------------|
    

    bytearrays are a good match for this job because it bridges the gap between regarding the data as a string and as a sequence of numbers.

    In [16]: bytearray([97,98,99])
    Out[16]: bytearray(b'abc')
    
    In [17]: print(bytearray([97,98,99]))
    abc
    

    As you can see above, bytearray(bytes) allows you to define the bytearray by passing it a sequence of ints (in range(256)), and allows you to write it out as though it were a string: g.write(bytearray(bytes)).


def using_loop(output, chunksize):
    with open(filename, 'r') as f, open(output, 'wb') as g:
        while True:
            chunk = f.read(chunksize)
            if chunk == '':
                break
            bytes = [int(chunk[i:i+8], 2)
                     for i in range(0, len(chunk), 8)]
            g.write(bytearray(bytes))

Make sure chunksize is a multiple of 8.


This is the code I used to create the tables. Note that prettytable also does something similar to this, and it may be advisable to use their code rather than my hack: table.py

This is the module I used to time the code: utils_timeit.py. (It uses table.py).

And here is the code I used to time using_loop (and other variants): timeit_bytearray_vs_struct.py

Upvotes: 2

Jaime
Jaime

Reputation: 67487

Use struct:

import struct
...
f2.write(struct.pack('b', int(byte,2))) # signed 8 bit int

or

f2.write(struct.pack('B', int(byte,2))) # unsigned 8 bit int

Upvotes: 1

Related Questions