Reputation: 315
I'm working with two hierarchical datasets that contain a complex relation (I'm not using SQL) and they don't share their primary look-up keys. We use this process to keep the two datasets syncronized.
Each dataset is currently stored as a dictionary with the dataset's key as the dictionary's key. After the complex relation is determined I store the other dataset's key as an attribute in the other. This has created the need to create some odd looking helper functions to then follow some of the parent-child relationships.
I was wondering if there might be a more-effective or faster method to this madness as i currently have to pass both datasets to processing functions that need to parse relationships.
examples:
leftdataset = {'10000': { 'key': '10000', 'fkey':'asdf', 'parent':'10001'},
'10001': { 'key': '10001', 'fkey':'qwer', 'parent':''},}
rightdataset= {'asdf': { 'key': 'asdf', 'fkey':'10000', 'parent':'qwer'},
'qwer': { 'key': 'qwer', 'fkey':'10001', 'parent':''},
In order to find the parent's fkey
I need to:
fkey = dataset[dataset['10000']['parent']]['fkey']
I was toying around with the idea of presenting a tuple of the key pairs and then looking for the key i need in it such as:
keys = [('10000', 'asdf'), ('10001', 'qwer')]
def find_key(key, keyset):
for keypair in keys:
if key in keypair:
k1, k2 = keypair
if k1 == key:
return k2
else:
return k1
But this sounds even less efficient than what I'm doing now. Am I just beating down the wrong path?
Upvotes: 2
Views: 212
Reputation: 229301
Is this usage appealing to you?
Easy look-up and usage of single entries:
>>> left("10000")
Entry({'parent': '10001', 'key': '10000', 'fkey': 'asdf'})
>>> left("10000")['key']
'10000'
>>> left("10000")['parent']
'10001'
Easy look-up of parents:
>>> left("10000").parent()
Entry({'parent': '', 'key': '10001', 'fkey': 'qwer'})
>>> left("10000").parent().parent()
>>> left("10001")
Entry({'parent': '', 'key': '10001', 'fkey': 'qwer'})
>>> left("10001") is left("10000").parent()
True
Easy look-up of related entries:
>>> left("10001").related()
Entry({'parent': '', 'key': 'qwer', 'fkey': '10001'})
>>> right("qwer")
Entry({'parent': '', 'key': 'qwer', 'fkey': '10001'})
>>> right(left("10001").related()['key'])
Entry({'parent': '', 'key': 'qwer', 'fkey': '10001'})
>>> right("qwer") is left("10001").related()
True
Particularly here is the example in your question: parent's foreign key:
>>> left("10000").parent()['fkey']
'qwer'
If so, then here is the code! Classes:
class Entry(object):
def __init__(self, dataset, d):
self.dataset = dataset
self.d = d
def parent(self):
return self.dataset.parent_of(self)
def related(self):
if not self.dataset.related_dataset:
raise ValueError("no related dataset specified")
return self.dataset.related_dataset(self['fkey'])
def __getitem__(self, k):
return self.d.__getitem__(k)
def __repr__(self):
return "Entry(%s)" % repr(self.d)
def __str__(self):
return str(self.d)
class Dataset(object):
def __init__(self, data):
self.data = dict((k, Entry(self, v)) for (k,v) in data.items())
self.related_dataset = None
def set_related_dataset(self, dataset):
self.related_dataset = dataset
def entry(self, key):
if isinstance(key, Entry): return key
return self.data[key]
def __call__(self, key):
return self.entry(key)
def parent_of(self, entry):
entry = self.entry(entry)
if not entry['parent']:
return None
return self.data[entry['parent']]
And the usage for the data you've provided:
leftdata = {'10000': { 'key': '10000', 'fkey':'asdf', 'parent':'10001'},
'10001': { 'key': '10001', 'fkey':'qwer', 'parent':''},}
rightdata = {'asdf': { 'key': 'asdf', 'fkey':'10000', 'parent':'qwer'},
'qwer': { 'key': 'qwer', 'fkey':'10001', 'parent':''}}
left = Dataset(leftdata)
right = Dataset(rightdata)
left.set_related_dataset(right)
right.set_related_dataset(left)
Explanation: Wrap each dict value in an Entry
class with __getitem__
defined to make it usable as a dict (more or less). Have a Dataset
class that maps primary keys to these Entry
s. Provide the Entry
access to this dataset and provide convenient methods .parent()
and .related()
. In order for .related()
to work, set which dataset the "related" one should be with set_related_dataset
and it all ties together.
Now you can even just pass Entry
s and you'll be able to access the related entries without needing to pass both datasets in.
Upvotes: 1
Reputation: 92559
Based off of Mark Ransom's comment, maybe you can organize a class like this:
class Storage(object):
def __init__(self):
self._leftdataset = {
'10000': { 'key': '10000', 'fkey':'asdf', 'parent':'10001'},
'10001': { 'key': '10001', 'fkey':'qwer', 'parent':''}
}
self._rightdataset= {
'asdf': { 'key': 'asdf', 'fkey':'10000', 'parent':'qwer'},
'qwer': { 'key': 'qwer', 'fkey':'10001', 'parent':''}
}
def get(self, key):
d1 = self._leftdataset
d2 = self._rightdataset
if key in d1:
left = d1[key]
right = d2[left['fkey']]
else:
right = d2[key]
left = d1[right['fkey']]
return left, right
And use a single lookup method:
s = Storage()
s.get('10000')
# ({'fkey': 'asdf', 'key': '10000', 'parent': '10001'},
# {'fkey': '10000', 'key': 'asdf', 'parent': 'qwer'})
s.get('qwer')
# ({'fkey': 'qwer', 'key': '10001', 'parent': ''},
# {'fkey': '10001', 'key': 'qwer', 'parent': ''})
Upvotes: 2