newbie
newbie

Reputation: 13

python read class data member from a file

I'm new to python so I don't know some basic stuff. I have a binary file which contains an array of objects. Objects stored are traditonal C structures. I would like to recreate that structure in python and read file content in a list of it's objects, make some modification to data and the store it back. Part with which I have trouble is reading file content's. I've read some similar questions about reading a file but they didn't answer questions I have. I've tried defining class and defining class members with __slot__ and reading data with pickle but it didn't quite work. Also it may be relevant that one of the data members is actually an array containing objects of different structure. What would be the best way to read this file?

Upvotes: 1

Views: 1092

Answers (1)

rrauenza
rrauenza

Reputation: 6973

You'll need to use python's struct.unpack(). You'll need to know precisely what types they are, and how they are packed on disk. pickle is specific to Python's storage format and won't be of any use to you, unless you are converting the serialization to something python specific.

I recently answered a vaguely similar question here that showed how to use mmap() into the file, which you may find more convenient than os.read()

I would probably start by creating a class that has a constructor where you initialize it with some combination of the file pointer, mmap object, and the offset.

Then the __init___() method would read and initialize attributes of self with the unpacked contents of the structure. Then add accessor methods to modify those attributes, and then a save() method to write them all back out using struct.pack with mmap() or os.write

Here is an example from the Python docs of packing and unpacking three integers (2 16bit shorts followed by a 32bit long):

>>> from struct import *
>>> pack('=hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('=hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8

It sounds like your data may be variable length ... which means you may not be able to modify the data in place.

Here's a Python2 example using both os.read() and mmap. I pre-created /tmp/three_numbers.dat with dd if=/dev/zero of=/tmp/three_numbers.dat count=1 bs=1k:

import mmap
import os
import struct


class ThreeNumbers(object):

    PACK = '=hhl'
    SIZEOF = struct.calcsize(PACK)

    def __init__(self, fd, offset):
        self._fd = fd
        self._offset = offset
        self._fd.seek(offset * self.SIZEOF)
        self._data = os.read(fd.fileno(), self.SIZEOF)
        self.numbers = struct.unpack(self.PACK, self._data)

    def save(self):
        self._fd.seek(self._offset * self.SIZEOF)
        os.write(self._fd.fileno(), struct.pack(self.PACK, *self.numbers))


class ThreeNumbersMMAP(object):

    PACK = '=hhl'
    SIZEOF = struct.calcsize(PACK)

    def __init__(self, mmap, offset):
        self._mmap = mmap
        self._offset = offset
        self._data = mmap[offset * self.SIZEOF:(offset + 1) * self.SIZEOF]
        self.numbers = struct.unpack(self.PACK, self._data)

    def save(self):
        self._mmap[self._offset * self.SIZEOF:(self._offset + 1) * self.SIZEOF] = struct.pack(self.PACK, *self.numbers)


fd = open("/tmp/three_numbers.dat", "rb+")

obj = ThreeNumbers(fd, 0)
print obj.numbers
obj.numbers = (1, 2, 3)
obj.save()

obj = ThreeNumbers(fd, 0)
print obj.numbers
obj.numbers = (0, 0, 0)
obj.save()

mmap = mmap.mmap(fd.fileno(), 0)

obj = ThreeNumbersMMAP(mmap, 0)
print obj.numbers
obj.numbers = (1, 2, 3)
obj.save()

obj = ThreeNumbersMMAP(mmap, 0)
print obj.numbers
obj.numbers = (0, 0, 0)
obj.save()

Upvotes: 0

Related Questions