Reputation: 85
I have a textfile containing a range of bits, in ascii:
cat myFile.txt
0101111011100011001...
I would like to write this to an other file in binary mode, so that i can read it in an hexeditor. How could I reach that? I tried already to convert it with code like:
f2=open(fileOut, 'wb')
with open(fileIn) as f:
while True:
c = f.read(1)
byte = byte+str(c)
if not c:
print "End of file"
break
if count % 8 is 0:
count = 0
print hex(int(byte,2))
try:
f2.write('\\x'+hex(int(byte,2))[2:]).zfill(2)
except:
pass
byte = ''
count += 1
but that didn't achieve what I planed to do. Do you have any hint?
Upvotes: 2
Views: 634
Reputation: 880687
Reading and writing one byte at a time is painfully slow. You may get around ~45x speedup of your code simply by reading more data from the file per call to f.read
and f.write
:
|------------------+--------------------|
| using_loop_20480 | 8.34 msec per loop |
| using_loop_8 | 354 msec per loop |
|------------------+--------------------|
using_loop
is the code shown at the bottom of this post. using_loop_20480
is the code with chunksize = 1024*20. This means that 20480 bytes are read from the file at a time. using_loop_1
is the same code with chunksize = 1.
Regarding count % 8 is 0
: Don't use is
to compare numerical values; use ==
instead. Here are some examples why is
may give you wrong results (maybe not in the code you posted, but in general, is
is not appropriate here):
In [5]: 1L is 1
Out[5]: False
In [6]: 1L == 1
Out[6]: True
In [7]: 0.0 is 0
Out[7]: False
In [8]: 0.0 == 0
Out[8]: True
Instead of
struct.pack('{n}B'.format(n = len(bytes)), *bytes)
you could use
bytearray(bytes)
Not only is it less typing, it is a slight bit faster too.
|------------------------------+--------------------|
| using_loop_20480 | 8.34 msec per loop |
| using_loop_with_struct_20480 | 8.59 msec per loop |
|------------------------------+--------------------|
bytearrays are a good match for this job because it bridges the gap between regarding the data as a string and as a sequence of numbers.
In [16]: bytearray([97,98,99])
Out[16]: bytearray(b'abc')
In [17]: print(bytearray([97,98,99]))
abc
As you can see above, bytearray(bytes)
allows you to
define the bytearray by passing it a sequence of ints (in
range(256)
), and allows you to write it out as though it were a
string: g.write(bytearray(bytes))
.
def using_loop(output, chunksize):
with open(filename, 'r') as f, open(output, 'wb') as g:
while True:
chunk = f.read(chunksize)
if chunk == '':
break
bytes = [int(chunk[i:i+8], 2)
for i in range(0, len(chunk), 8)]
g.write(bytearray(bytes))
Make sure chunksize is a multiple of 8.
This is the code I used to create the tables. Note that prettytable also does something similar to this, and it may be advisable to use their code rather than my hack: table.py
This is the module I used to time the code: utils_timeit.py. (It uses table.py).
And here is the code I used to time using_loop
(and other variants): timeit_bytearray_vs_struct.py
Upvotes: 2