hatmatrix
hatmatrix

Reputation: 44862

segmenting and writing binary file using Python

I have two binary input files, firstfile and secondfile. secondfile is firstfile + additional material. I want to isolate this additional material in a separate file, newfile. This is what I have so far:

import os
import struct

origbytes = os.path.getsize(firstfile)
fullbytes = os.path.getsize(secondfile)
numbytes = fullbytes-origbytes

with open(secondfile,'rb') as f:
    first = f.read(origbytes)
    rest = f.read()

Naturally, my inclination is to do (which seems to work):

with open(newfile,'wb') as f:
    f.write(rest)

I can't find it but thought I read on SO that I should pack this first using struct.pack before writing to file. The following gives me an error:

with open(newfile,'wb') as f:
    f.write(struct.pack('%%%ds' % numbytes,rest))

-----> error: bad char in struct format

This works however:

with open(newfile,'wb') as f:
    f.write(struct.pack('c'*numbytes,*rest))

And for the ones that work, this gives me the right answer

with open(newfile,'rb') as f:
    test = f.read()

len(test)==numbytes

-----> True

Is this the correct way to write a binary file? I just want to make sure I'm doing this part correctly to diagnose if the second part of the file is corrupted as another reader program I am feeding newfile to is telling me, or I am doing this wrong. Thank you.

Upvotes: 0

Views: 1832

Answers (4)

Martin Vilcans
Martin Vilcans

Reputation: 5718

There is no reason to use the struct module, which is for converting between binary formats and Python objects. There's no conversion needed here.

Strings in Python 2.x are just an array of bytes and can be read and written to and from files. (In Python 3.x, the read function returns a bytes object, which is the same thing, if you open the file with open(filename, 'rb').)

So you can just read the file into a string, then write it again:

import os

origbytes = os.path.getsize(firstfile)
fullbytes = os.path.getsize(secondfile)
numbytes = fullbytes-origbytes

with open(secondfile,'rb') as f:
    first = f.seek(origbytes)
    rest = f.read()

with open(newfile,'wb') as f:
    f.write(rest)

Upvotes: 2

Andriy Tylychko
Andriy Tylychko

Reputation: 16256

  1. You don't need to read origbytes, just move file pointer to the right position: f.seek(numbytes)
  2. You don't need struct packing, write rest to the newfile.

Upvotes: 1

retracile
retracile

Reputation: 12339

If you know that secondfile is the same as firstfile + appended data, why even read in the first part of secondfile?

with open(secondfile,'rb') as f:
    f.seek(origbytes)
    rest = f.read()

As for writing things out,

with open(newfile,'wb') as f:
    f.write(rest)

is just fine. The stuff with struct would just be a no-op anyway. The only thing you might consider is the size of rest. If it could be large, you may want to read and write the data in blocks.

Upvotes: 3

This is not c, there is no % in the format string. What you want is:

f.write(struct.pack('%ds' % numbytes,rest))

It worked for me:

>>> struct.pack('%ds' % 5,'abcde')
'abcde'

Explanation: '%%%ds' % 15 is '%15s', while what you want is '%ds' % 15 which is '15s'

Upvotes: 0

Related Questions