heltonbiker
heltonbiker

Reputation: 27595

Splitting binary file content in two parts using single byte separator in python

I have a file consisting in three parts:

  1. Xml header (unicode);
  2. ASCII character 29 (group separator);
  3. A numeric stream to the end of file

I want to get one xml string from the first part, and the numeric stream (to be parsed with struct.unpack or array.fromfile).

Should I create an empty string and add to it reading the file byte by byte until I find the separator, like shown here?

Or is there a way to read everything and use something like xmlstring = open('file.dat', 'rb').read().split(chr(29))[0] (which by the way doesn't work) ?

EDIT: this is what I see using a hex editor: the separator is there (selected byte)

enter image description here

Upvotes: 0

Views: 4600

Answers (3)

Lukas Graf
Lukas Graf

Reputation: 32610

Your attempt at searching for the value chr(29) didn't work because in that expression 29 is a value in decimal notation. The value you got from your hex editor however is displayed in hex, so it's 0x29 (or 41 in decimal).

You can simply do the conversion in Python - 0xnn is just another notation for entering an integer literal:

>>> 0x29
41

You can then use str.partition to split the data into your respective parts:

with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

Demonstration:

import random

SEP = chr(0x29)


with open('file.dat', 'wb') as outfile:
    outfile.write("<doc></doc>")
    outfile.write(SEP)
    data = ''.join(chr(random.randint(0, 255)) for i in range(1024))
    outfile.write(data)


with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

print xml
print len(binary_data)

Output:

<doc></doc>
1024

Upvotes: 1

Tui Popenoe
Tui Popenoe

Reputation: 2114

Make sure you are reading the file in before trying to split it. In your code, your don't have a .read()

with open('file.dat', 'rb') as f:
    file = f.read()
    if chr(29) in file:
        xmlstring = file.split(chr(29))[0]
    elif hex(29) in file:
        xmlstring = file.split(hex(29))[0]
    else:
        xmlstring = '\x1d not found!'

Ensure that a ASCII 29 char exists in your file (\x1d)

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798744

mmap the file, search for the 29, create a buffer or memoryview from the first part to feed to the parser, and pass the rest through struct.

Upvotes: 1

Related Questions