Data locality optimization in Python

Question

The question is about how is the memory allocated in Python for this class, and the question is asked more out of curiosity than for real life value.

The following post processing class handles huge amounts of data (numpy arrays). This is simplified version of the real code to make the problem readable.The shared data of from the base class is now replaced with the characteristic_length variable.

class Reader(object):
    characteristic_length = 1
    def factory(type_name, file_name):
        if type_name == "OFReader": return OFReader(file_name)
        ...
        assert 0, "Bad reader instantiation: " + type_name
...
class OFReader(Reader):
    def __init__(self, of_file_name):
        self.file_name = of_file_name
        self.read_data()
    ....
    def length(self):
        return self.data[:,0] / self.characteristic_length
...

Being a stranger to the python way of handling memory, I am curious whether placing the characteristic_length in the subclass would improve the performance a lot?

Based on the python documentation, I imagine that derived class instance has a pointer to the address where the base class is defined. If the lookup in the subclass fails, it is continued in the base class. This sounds like using the base class attributes to store data shared by the derived classes is awfully expensive.

In the real application the shared data is somewhat large numpy array, but common to all derived classes. Being lazy, I want to avoid reading the data for each subclass separately. Any pointers on how to solve this effectively?

Roland Smith · Accepted Answer

The simplest solution is to make the numpy array a global. If you want to prevent users from modifying that array you can set the writable flag to False.

Alternatively you could make it a class attribute of the Reader class.

Data locality optimization in Python

Answers (1)

Related Questions