Reputation: 637

seek ( ) while reading in binary files

I'm an uber-beginner with Python; I've rather been thrown into the deep end. A bit of background: the files we're reading are from a sonar imaging camera; at the moment I'm trying to read in attributes written into the files such as date, filename, number of frames, number of beams, etc. First, I'd like to read in the FILE header. Then, for each frame, I'd like to read in the FRAME header. I need to read in the frame headers where the file headers have left off... I believe I need seek() to be able to do this. Here's the code I have currently, to read the file headers (successfully done) and begin where that information ends for the frame headers:

EDITED CODE:

import math, struct
def __init__(didson):
    print "this better work"

def get_file_header(data,offset=0):
    fileheader={}
    winlengths=[1.125,2.25,4.5,9,18,36]
    fileheader['filetype']=struct.unpack("3s",didson_data[0:3])
    fileheader['fileversion']=struct.unpack('B',didson_data[3:4])[0]
    fileheader['numframes']=struct.unpack('l',didson_data[4:8])
    fileheader['framerate']=struct.unpack('l',didson_data[8:12])
    fileheader['resolution']=struct.unpack('i',didson_data[12:16])
    fileheader['numbeams']=struct.unpack('i',didson_data[16:20])
    fileheader['samplerate']=struct.unpack('f',didson_data[20:24])
    fileheader['samplesperchannel']=struct.unpack('l',didson_data[24:28])
    fileheader['receivergain']=struct.unpack('l',didson_data[28:32])
    fileheader['windowstart']=struct.unpack('i',didson_data[32:36])
    fileheader['winlengthsindex']=struct.unpack('i',didson_data[36:40])
    fileheader['reverse']=struct.unpack('l',didson_data[40:44])
    fileheader['serialnumber']=struct.unpack('l',didson_data[44:48])
    fileheader['date']=struct.unpack("10s",didson_data[48:58])
    #fileheader['???']=struct.unpack('26s',didson_data[58:84])
    fileheader['idstring']=struct.unpack("33s",didson_data[84:117])
    #fileheader['????2']=struct.unpack('235s',didson_data[117:352])
    fileheader['framestart']=struct.unpack('i',didson_data[352:356])
    fileheader['frameend']=struct.unpack('i',didson_data[356:360])
    fileheader['timelapse']=struct.unpack('i',didson_data[360:364])
    fileheader['recordInterval']=struct.unpack('i',didson_data[364:368])
    fileheader['radioseconds']=struct.unpack('i',didson_data[368:372])
    fileheader['frameinterval']=struct.unpack('i',didson_data[372:376])

    return fileheader




def num_datagrams(didson_data):
    assert(len(didson_data) % datagram_size==0)
    return len(didson_data)/datagram_size

def get_offset(datagram_number):
    return datagram_number * datagram_size

def didson_print(fileheader):
    print fileheader
    for key in fileheader:
        print ' ',key, fileheader[key]


def main():
    didson_file=open('C:/vprice/DIDSON/DIDSON Data/test.ddf', 'rb')
    didson_data=didson_file.read()
    print 'Number of datagrams:', num_datagrams(didson_data)
    didson_print(datagram)


if __name__=='main':
    main()

Now if I run "main", will I be able to read line by line? I'm not sure if it is one value per line... I basically went through and figured out byte by byte to figure out what header values were located where.

Any help would be appreciated!!

Upvotes: 3

Answers (3)

John S Gruber

Reputation: 238

Why not continue to read all of just the headers in one go, rather than the whole file. Then your file will be positioned ready to start reading the data past the headers. It looks like changing the read from:

didson_data=didson_file.read()

pos=didson_file.seek(0,0)

To just:

didson_data=didson_file.read(377)

only would do that, leaving the position at decimal offset 377, right after the frameinterval header.

There's no reason to make this more complicated to save so little memory.

A more general solution for reading the rest of the file in variable chunks, and keeping track of where you are, would be to use your own function. It could read the file with a size big enough to hold the largest possible data element, figure out the data element's real size, save the data element to a string, seek to the (incoming offset in the file when the function began) + (the length of the data element just retrieved), and then return the data element string.

Basically:

You would be seeked to right past the headers and then repeatedly call

def get_chunk(fileobject):
    result = fileobject.read(1024)
    if len(result) == 0: # End of file
        return Null
    ## Determine what this is = thing 
    fileobject.seek(fileobject.tell()-1024+len(thing)
    return thing

until it returned a Null

 while True:
        the_thing = get_chunk(didson_file)
        if not the_thing: # It's a Null--it's the end of the file
            return
        # process the_thing
# End the program

Once you get past the headers you will have to have a way of parsing an object somehow, and determining how long it is. The get_chunk function can return objects of different types in Python. Just by looking at the type of the_think the *#process the_thing* section could do different things for different kinds of data.

For a true binary file readlines function shouldn't be used. Any linefeeds in the data would be accidental so you wouldn't want to use them to break apart the file. The idea of looking at the readlines function, however is a good one--but you'd have to adapt what you learn from it rather than copy from it. I assume its a generator function, which is a cool idea, and can remember all kinds of state from one invocation of the function to the next. But since you only need to remember where you are in the file, this kind of thing could work and is simpler to understand (but a little less time-efficient).

Upvotes: 0

steveha

Reputation: 76715

If your file is binary data, and if it is only going to be a few megabytes, you might want to read the whole thing at once. This is what you are doing right now with didson_file.read().

If the file is text data, organized into lines, there is a nice idiom that you can use to conveniently process it one line at a time:

with open("my_file_name") as f:
    for line in f:
        do_something_with_line(line)

Actually, since you have those structs you need to parse, it's pretty clear that you are reading a binary file. In that case, you should either slurp the whole thing (if memory usage isn't a problem), or else read it in chunks (more complex, but keeps memory usage down).

Upvotes: 0

Christian Witts

Reputation: 11585

You read the entire contents of the file into didson_data, then seek the file handler didson_file back to zero, and never use it again as you're splitting all your fields up from didson_data and not stepping through lines/chunks in your file, so of course your second .tell() will still be at position zero as you haven't moved anywhere since you seeked to position zero.

Upvotes: 2

seek ( ) while reading in binary files

Answers (3)

Related Questions