Reputation: 6066
What I want is to be able to handle sets of data that have a fixed set of keys. All keys are strings. The data will never be edited. I know this can be done with normal dicts like so:
data_a = {'key1': 'data1a', 'key2': 'data2a', 'key3': 'data3a'}
data_b = {'key1': 'data1b', 'key2': 'data2b', 'key3': 'data3b'}
data_c = {'key1': 'data1c', 'key2': 'data2c', 'key3': 'data3c'}
They must be able to be called like so:
data_a['key1'] # Returns 'data1a'
However, this looks to be a waste of memory (since dictionaries apparently keep themselves 1/3 empty or something like that, along with also storing the keys multiple times) and also tedious to create as well since I need to keep entering the same keys over and over again in my code. I also risk accidentally changing something in the datasets.
My current solution is to have a set of keys stored in a tuple first, then store the data as tuples too. It looks like this:
keys = ('key1', 'key2', 'key3')
data_a = ('data1a', 'data2a', 'data3a')
data_b = ('data1b', 'data2b', 'data3b')
data_c = ('data1b', 'data2c', 'data3c')
To retrieve data, I would do this:
data_a[keys.index('key1')] # Returns 'data1a'
Then, I learned about this thing called namedtuples which seem to be able to do what I needed:
import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')
However, it appears I can't simply call the value by the key. Instead, to retrieve the data by the key, I have to use getattr, which doesn't seem very intuitive:
getattr(data_a,'key1') # Returns 'data1a'
My criteria is for memory efficiency first, then performance efficiency. Of these 3 methods, which would be the best way to do things? Or am I missing something and there's a more pythonic idiom to get what I want?
EDIT: I've now recently also learned about the existence of __slots__
, which apparently runs more efficiently for key:value pairs while pretty much consuming the same(?) amount of memory. Would an implementation acting similar to this be a suitable alternative to namedtuples?
Upvotes: 2
Views: 2901
Reputation: 363547
Yes, __slots__
should do.
class Data:
__slots__ = ["key1", "key2"]
def __init__(self, k1, k2):
self.key1, self.key2 = k1, k2
def __getitem__(self, key):
if key not in self.__slots__:
raise KeyError("%r not found" % key)
return getattr(self, key)
Let's try that out:
>>> Data(1, 2)["key1"]
1
The conditional on key not in self.__slots__
is a sanity check; getattr
would happily fetch __init__
for us if it weren't present.
Upvotes: 1
Reputation: 65791
namedtuple
seems the right thing to use. If your "keys" are fixed, you don't need getattr
and can use the normal syntax for retrieving objects' attributes:
In [1]: %paste
import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')
## -- End pasted text --
In [2]: data_a.key1
Out[2]: 'data1a'
This usage is also demonstrated in the docs:
>>> # Basic example
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
You don't usually use getattr
if the second argument (attribute name) is constant. It's only needed if it may change:
In [3]: attr = input('Attribute: ')
Attribute: key3
In [4]: getattr(data_b, attr)
Out[4]: 'data3b'
Upvotes: 1