unpacking binary file using struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile

Question

I am unpacking large binary files (~1GB) with many different datatypes. I am in the early stages of creating the loop to covert each byte. I have been using struct.unpack, but recently thought it would run faster if I utilized numpy. However switching to numpy has slowed down my program. I have tried:

struct.unpack
np.fromfile
np.frombuffer
np.ndarray

note:in the np.fromfile method I leave the file open and don't load it into memory and seek through it

1)

with open(file="file_loc" , mode='rb') as file: 
    RAW = file.read()
byte=0
len = len(RAW)
while( byte < len):
    header = struct.unpack(">HHIH", RAW[byte:(byte+10)])
    size = header[1]
    loc  = str(header[3])
    data[loc] = struct.unpack(">B", RAW[byte+10:byte+size-10)
    byte+=size

2)

dt=('>u2,>u2,>u4,>u2')
with open(file="file_loc" , mode='rb') as RAW:
    same loop as above:
        header = np.fromfile(RAW[byte:byte+10], dtype=dt, count=1)[0]
        data   = np.fromfile(RAW[byte+10:byte+size-10], dtype=">u1", count=size-10)

3)

dt=('>u2,>u2,>u4,>u2')
with open(file="file_loc" , mode='rb') as file:
    RAW = file.read()
same loop:
    header = np.ndarray(buffer=RAW[byte:byte+10], dtype=dt_header, shape= 1)[0]
    data   = np.ndarray(buffer=RAW[byte+10:byte+size-10], dtype=">u1", shape=size-10)

4) pretty much the same as 3 except using np.frombuffer()

All of the numpy implementations process at about half the speed as the struct.unpack method, which is not what I expected.

Let me know if there is anything I can do to improve performance.

also, I just typed this from memory so it might have some errors.

unpacking binary file using struct.unpack VS np.frombuffer VS np.ndarray VS np.fromfile

Answers (1)

Related Questions