Reputation: 13
I'm new to python so I don't know some basic stuff. I have a binary file which contains an array of objects. Objects stored are traditonal C structures. I would like to recreate that structure in python and read file content in a list of it's objects, make some modification to data and the store it back. Part with which I have trouble is reading file content's. I've read some similar questions about reading a file but they didn't answer questions I have. I've tried defining class and defining class members with __slot__
and reading data with pickle
but it didn't quite work. Also it may be relevant that one of the data members is actually an array containing objects of different structure. What would be the best way to read this file?
Upvotes: 1
Views: 1092
Reputation: 6973
You'll need to use python's struct.unpack()
. You'll need to know precisely what types they are, and how they are packed on disk. pickle
is specific to Python's storage format and won't be of any use to you, unless you are converting the serialization to something python specific.
I recently answered a vaguely similar question here that showed how to use mmap()
into the file, which you may find more convenient than os.read()
I would probably start by creating a class that has a constructor where you initialize it with some combination of the file pointer, mmap
object, and the offset.
Then the __init___()
method would read and initialize attributes of self
with the unpacked contents of the structure. Then add accessor methods to modify those attributes, and then a save()
method to write them all back out using struct.pack
with mmap()
or os.write
Here is an example from the Python docs of packing and unpacking three integers (2 16bit shorts followed by a 32bit long):
>>> from struct import *
>>> pack('=hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('=hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8
It sounds like your data may be variable length ... which means you may not be able to modify the data in place.
Here's a Python2 example using both os.read()
and mmap
. I pre-created /tmp/three_numbers.dat
with dd if=/dev/zero of=/tmp/three_numbers.dat count=1 bs=1k
:
import mmap
import os
import struct
class ThreeNumbers(object):
PACK = '=hhl'
SIZEOF = struct.calcsize(PACK)
def __init__(self, fd, offset):
self._fd = fd
self._offset = offset
self._fd.seek(offset * self.SIZEOF)
self._data = os.read(fd.fileno(), self.SIZEOF)
self.numbers = struct.unpack(self.PACK, self._data)
def save(self):
self._fd.seek(self._offset * self.SIZEOF)
os.write(self._fd.fileno(), struct.pack(self.PACK, *self.numbers))
class ThreeNumbersMMAP(object):
PACK = '=hhl'
SIZEOF = struct.calcsize(PACK)
def __init__(self, mmap, offset):
self._mmap = mmap
self._offset = offset
self._data = mmap[offset * self.SIZEOF:(offset + 1) * self.SIZEOF]
self.numbers = struct.unpack(self.PACK, self._data)
def save(self):
self._mmap[self._offset * self.SIZEOF:(self._offset + 1) * self.SIZEOF] = struct.pack(self.PACK, *self.numbers)
fd = open("/tmp/three_numbers.dat", "rb+")
obj = ThreeNumbers(fd, 0)
print obj.numbers
obj.numbers = (1, 2, 3)
obj.save()
obj = ThreeNumbers(fd, 0)
print obj.numbers
obj.numbers = (0, 0, 0)
obj.save()
mmap = mmap.mmap(fd.fileno(), 0)
obj = ThreeNumbersMMAP(mmap, 0)
print obj.numbers
obj.numbers = (1, 2, 3)
obj.save()
obj = ThreeNumbersMMAP(mmap, 0)
print obj.numbers
obj.numbers = (0, 0, 0)
obj.save()
Upvotes: 0