Reputation: 5031
I am looking for the best way to create a list in python that creates hashed indexes (dicts) for all the properties of the objects put into the list.
>>> foo = IndexingList([{ 'id': 1, 'name': 'cat' }, { 'id': 2, 'name': 'dog' }])
>>> foo[0]
{'id': 1, 'name': 'cat'}
>>> foo.findall('id', 2)
[{'id': 2, 'name': 'dog'}]
>>> foo += {'id': 3, 'name': 'dog'}
>>> foo.findall('name', 'dog')
[{'id': 2, 'name': 'dog'}, {'id': 3, 'name': 'dog'}]
I imagine the data structure of the IndexingList would then look like this:
{
'items': [
{ 'id': 1, 'name': 'cat' },
{ 'id': 2, 'name': 'dog' }
],
'indexes': {
'id': {
1: [{ 'id': 1, 'name': 'cat' }],
2: [{ 'id': 2, 'name': 'dog' }]
},
'name': {
'cat': [{ 'id': 1, 'name': 'cat' }],
'dog': [
{ 'id': 2, 'name': 'dog' },
{ 'id': 3, 'name': 'dog' }
]
}
}
}
where the objects within the 'indexes' nodes refer to the same objects in 'items'.
I think property values that are themselves objects could receive unique index-keys by using str(property) to obtain something to stick in 'indexes'.
Upvotes: 0
Views: 193
Reputation: 3068
I must say that Lattyware provides a really nice solution. I'll still provide my own quick and dirty approach, as when when indexing on unique items it's a simple one liner. Instead of building a nice wrapper container I sometimes create an index on a certain column:
my_list = [('aap', 123), ('noot', 234), ('mies', 345), ('mies', 456)]
Provided the key in that column is unique and we don't add any new elements to the list nor modify the value we indexed on we may use:
def mk_unique_index(data, col):
g = ((elem[col], elem) for elem in data)
return dict(g)
So we can use it like:
>>> idx = mk_unique_index(my_list, 1)
>>> idx[123]
('aap', 123)
However, if we wish to index on the 0th column we must use a defaultdict
from collections import defaultdict
def mk_index(data, col):
d = defaultdict(list)
for elem in data:
d[elem[col]].append(elem)
return d
Usage:
>>> idx = mk_index(my_list, 0)
>>> idx['mies']
[('mies', 345), ('mies', 456)]
If instead of tuples you're using dictionaries or even named tuples (provided all elements have the field you're indexing on) you could just provide the field name for the column Obviously one might also choose to use a temporary sqlite database in memory.
Upvotes: 0
Reputation: 88987
This is actually pretty easy to do using some collections.defaultdict()
s - although you might consider using an actual database if you are using this a lot.
from collections import defaultdict
from functools import partial
class IndexingList:
def __init__(self, items):
self.items = []
self.indices = defaultdict(partial(defaultdict, list))
self.extend(items)
def append(self, item):
try:
for index, value in item.items():
self.indices[index][value].append(item)
except AttributeError as e:
raise ValueError("All children of an IndexingList must be "
"dict-like. '{0}' is not.".format(item)) from e
self.items.append(item)
def extend(self, iterable):
for item in iterable:
self.append(item)
def __iadd__(self, other):
self.extend(other)
return self
def __getitem__(self, item):
return self.items[item]
def __setitem__(self, item, value):
self.items[item] = value
def __delitem__(self, item):
del self.items[item]
for index, value in item.items():
self.indices[index][value].remove(item)
def find_all(self, index, value):
return self.indices[index][value]
def __repr__(self):
return repr(self.items)
Used like so:
>>> foo = IndexingList([{ 'id': 1, 'name': 'cat' }, { 'id': 2, 'name': 'dog' }])
>>> foo[0]
{'id': 1, 'name': 'cat'}
>>> foo.find_all("id", 2)
[{'id': 2, 'name': 'dog'}]
>>> foo += [{'id': 3, 'name': 'dog'}]
>>> foo.find_all('name', 'dog')
[{'id': 2, 'name': 'dog'}, {'id': 3, 'name': 'dog'}]
Upvotes: 3