Reputation: 373
I have a huge binary file(several GB) that has the following dataformat:
4 subsequent bytes form one composite datapoint(32 bits) which consists of:
b0-b3 4 flag bits
b4-b17 14 bit signed integer
b18-b32 14 bit signed integer
I need to access both signed integers and the flag bits separately and append to a list or some smarter datastructure (not yet decided). At the moment I'm using the following code to read it in:
from collections import namedtuple
DataPackage = namedtuple('DataPackage', ['ie', 'if1', 'if2', 'if3', 'quad2', 'quad1'])
def _unpack_integer(bits):
value = int(bits, 2)
if bits[0] == '1':
value -= (1 << len(bits))
return value
def unpack(data):
bits = ''.join(['{0:08b}'.format(b) for b in bytearray(data)])
flags = [bool(bits[i]) for i in range(4)]
quad2 = _unpack_integer(bits[4:18])
quad1 = _unpack_integer(bits[18:])
return DataPackage(flags[0], flags[1], flags[2], flags[3], quad2, quad1)
def read_file(filename, datapoints=None):
data = []
i = 0
with open(filename, 'rb') as fh:
value = fh.read(4)
while value:
dp = unpack(value)
data.append(dp)
value = fh.read(4)
i += 1
if i % 10000 == 0:
print('Read: %d kB' % (float(i) * 4.0 / 1000.0))
if datapoints:
if i == datapoints:
break
return data
if __name__ == '__main__':
data = read_heterodyne_file('test.dat')
This code works but it's too slow for my purposes (2s for 100k datapoints with 4byte each). I would need a factor of 10 in speed at least.
The profiler says that the code spends it's time mostly in string formatting(to get the bits) and in _unpack_integer().
Unfortunately I am not sure how to proceed here. I'm thinking about either using cython or directly writing some c code to do the read in. I also tried Pypy ant it gave me huge performance gain but unfortunately it needs to be compatible to a bigger project which doesn't work with Pypy.
Upvotes: 0
Views: 346
Reputation: 373
Thanks to the hint by Jean-François Fabre I found a suitable sulution using bitmasks which gives me a speedup of factor 6 in comparison to the code in the question. It has now a throuput of around 300k datapoints/s.
Also I neglected using the admittedly nice named tuples and replaced it by a list because I found out this is also a bottleneck.
The code now looks like
masks = [2**(31-i) for i in range(4)]
def unpack3(data):
data = struct.unpack('>I', data)[0]
quad2 = (data & 0xfffc000) >> 14
quad1 = data & 0x3fff
if (quad2 & (1 << (14 - 1))) != 0:
quad2 = quad2 - (1 << 14)
if (quad1 & (1 << (14 - 1))) != 0:
quad1 = quad1 - (1 << 14)
flag0 = data & masks[0]
flag1 = data & masks[1]
flag2 = data & masks[2]
flag3 = data & masks[3]
return flag0, flag1, flag2, flag3, quad2, quad1
The line profiler says:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
58 @profile
59 def unpack3(data):
60 1000000 3805727 3.8 12.3 data = struct.unpack('>I', data)[0]
61 1000000 2670576 2.7 8.7 quad2 = (data & 0xfffc000) >> 14
62 1000000 2257150 2.3 7.3 quad1 = data & 0x3fff
63 1000000 2634679 2.6 8.5 if (quad2 & (1 << (14 - 1))) != 0:
64 976874 2234091 2.3 7.2 quad2 = quad2 - (1 << 14)
65 1000000 2660488 2.7 8.6 if (quad1 & (1 << (14 - 1))) != 0:
66 510978 1218965 2.4 3.9 quad1 = quad1 - (1 << 14)
67 1000000 3099397 3.1 10.0 flag0 = data & masks[0]
68 1000000 2583991 2.6 8.4 flag1 = data & masks[1]
69 1000000 2486619 2.5 8.1 flag2 = data & masks[2]
70 1000000 2473058 2.5 8.0 flag3 = data & masks[3]
71 1000000 2742228 2.7 8.9 return flag0, flag1, flag2, flag3, quad2, quad1
So there is not one clear bottleneck anymore. Probably now it's as fast as it gets in pure Python. Or does anyone have an idea for further speedup?
Upvotes: 1
Reputation: 991
I would recommend trying ctypes, if you already have a c/c++ library that recognizes the data-strcture. The benefits are, the datastructues are still available to your python while the 'loading' would be fast. If you already have a c library to load the data you can use the function call from that library to do the heavy lifting and just map the data into your python structures. I'm sorry I won't be able to try out and provide proper code for your example (perhaps someone else cane) but here are a couple of tips to get you started
My take on how one might create bit vectors in python: https://stackoverflow.com/a/40364970/262108
The approach I mentioned above which I applied to a similar problem that you described. Here I use ctypes to create a ctypes data-structure (thus enabling me to use the object as any other python object), while also being able to pass it along to a C library:
https://gist.github.com/lonetwin/2bfdd41da41dae326afb
Upvotes: 1