Best way to store fixed-keys key:value datasets that are accessed by keys in python?

Question

What I want is to be able to handle sets of data that have a fixed set of keys. All keys are strings. The data will never be edited. I know this can be done with normal dicts like so:

data_a = {'key1': 'data1a', 'key2': 'data2a', 'key3': 'data3a'}
data_b = {'key1': 'data1b', 'key2': 'data2b', 'key3': 'data3b'}
data_c = {'key1': 'data1c', 'key2': 'data2c', 'key3': 'data3c'}

They must be able to be called like so:

data_a['key1'] # Returns 'data1a'

However, this looks to be a waste of memory (since dictionaries apparently keep themselves 1/3 empty or something like that, along with also storing the keys multiple times) and also tedious to create as well since I need to keep entering the same keys over and over again in my code. I also risk accidentally changing something in the datasets.

My current solution is to have a set of keys stored in a tuple first, then store the data as tuples too. It looks like this:

keys = ('key1', 'key2', 'key3')
data_a = ('data1a', 'data2a', 'data3a')
data_b = ('data1b', 'data2b', 'data3b')
data_c = ('data1b', 'data2c', 'data3c')

To retrieve data, I would do this:

data_a[keys.index('key1')] # Returns 'data1a'

Then, I learned about this thing called namedtuples which seem to be able to do what I needed:

import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')

However, it appears I can't simply call the value by the key. Instead, to retrieve the data by the key, I have to use getattr, which doesn't seem very intuitive:

getattr(data_a,'key1') # Returns 'data1a'

My criteria is for memory efficiency first, then performance efficiency. Of these 3 methods, which would be the best way to do things? Or am I missing something and there's a more pythonic idiom to get what I want?

EDIT: I've now recently also learned about the existence of __slots__, which apparently runs more efficiently for key:value pairs while pretty much consuming the same(?) amount of memory. Would an implementation acting similar to this be a suitable alternative to namedtuples?

Fred Foo · Accepted Answer

Yes, __slots__ should do.

class Data:
    __slots__ = ["key1", "key2"]

    def __init__(self, k1, k2):
        self.key1, self.key2 = k1, k2

    def __getitem__(self, key):
        if key not in self.__slots__:
            raise KeyError("%r not found" % key)
        return getattr(self, key)

Let's try that out:

>>> Data(1, 2)["key1"]
1

The conditional on key not in self.__slots__ is a sanity check; getattr would happily fetch __init__ for us if it weren't present.

Best way to store fixed-keys key:value datasets that are accessed by keys in python?

Answers (2)

Related Questions