Xarses
Xarses

Reputation: 315

Python: I have a dual key list, which is the most effective storage object for retreival

I'm working with two hierarchical datasets that contain a complex relation (I'm not using SQL) and they don't share their primary look-up keys. We use this process to keep the two datasets syncronized.

Each dataset is currently stored as a dictionary with the dataset's key as the dictionary's key. After the complex relation is determined I store the other dataset's key as an attribute in the other. This has created the need to create some odd looking helper functions to then follow some of the parent-child relationships.

I was wondering if there might be a more-effective or faster method to this madness as i currently have to pass both datasets to processing functions that need to parse relationships.

examples:

leftdataset = {'10000': { 'key': '10000', 'fkey':'asdf', 'parent':'10001'},
               '10001': { 'key': '10001', 'fkey':'qwer', 'parent':''},}
rightdataset= {'asdf': { 'key': 'asdf', 'fkey':'10000', 'parent':'qwer'},
               'qwer': { 'key': 'qwer', 'fkey':'10001', 'parent':''},

In order to find the parent's fkey I need to:

fkey = dataset[dataset['10000']['parent']]['fkey']

I was toying around with the idea of presenting a tuple of the key pairs and then looking for the key i need in it such as:

keys = [('10000', 'asdf'), ('10001', 'qwer')]

def find_key(key, keyset):
  for keypair in keys:
    if key in keypair:
      k1, k2 = keypair
      if k1 == key:
        return k2
      else:
        return k1

But this sounds even less efficient than what I'm doing now. Am I just beating down the wrong path?

Upvotes: 2

Views: 212

Answers (2)

Claudiu
Claudiu

Reputation: 229301

Is this usage appealing to you?

Easy look-up and usage of single entries:

>>> left("10000")
Entry({'parent': '10001', 'key': '10000', 'fkey': 'asdf'})
>>> left("10000")['key']
'10000'
>>> left("10000")['parent']
'10001'

Easy look-up of parents:

>>> left("10000").parent()
Entry({'parent': '', 'key': '10001', 'fkey': 'qwer'})
>>> left("10000").parent().parent()
>>> left("10001")
Entry({'parent': '', 'key': '10001', 'fkey': 'qwer'})
>>> left("10001") is left("10000").parent()
True

Easy look-up of related entries:

>>> left("10001").related()
Entry({'parent': '', 'key': 'qwer', 'fkey': '10001'})
>>> right("qwer")
Entry({'parent': '', 'key': 'qwer', 'fkey': '10001'})
>>> right(left("10001").related()['key'])
Entry({'parent': '', 'key': 'qwer', 'fkey': '10001'})
>>> right("qwer") is left("10001").related()
True

Particularly here is the example in your question: parent's foreign key:

>>> left("10000").parent()['fkey']
'qwer'

If so, then here is the code! Classes:

class Entry(object):
    def __init__(self, dataset, d):
        self.dataset = dataset
        self.d = d

    def parent(self):
        return self.dataset.parent_of(self)
    def related(self):
        if not self.dataset.related_dataset:
            raise ValueError("no related dataset specified")
        return self.dataset.related_dataset(self['fkey'])

    def __getitem__(self, k):
        return self.d.__getitem__(k)

    def __repr__(self):
        return "Entry(%s)" % repr(self.d)
    def __str__(self):
        return str(self.d)

class Dataset(object):
    def __init__(self, data):
        self.data = dict((k, Entry(self, v)) for (k,v) in data.items())
        self.related_dataset = None

    def set_related_dataset(self, dataset):
        self.related_dataset = dataset

    def entry(self, key):
        if isinstance(key, Entry): return key
        return self.data[key]
    def __call__(self, key):
        return self.entry(key)

    def parent_of(self, entry):
        entry = self.entry(entry)

        if not entry['parent']:
            return None
        return self.data[entry['parent']]

And the usage for the data you've provided:

leftdata = {'10000': { 'key': '10000', 'fkey':'asdf', 'parent':'10001'},
               '10001': { 'key': '10001', 'fkey':'qwer', 'parent':''},}
rightdata = {'asdf': { 'key': 'asdf', 'fkey':'10000', 'parent':'qwer'},
               'qwer': { 'key': 'qwer', 'fkey':'10001', 'parent':''}}

left = Dataset(leftdata)
right = Dataset(rightdata)
left.set_related_dataset(right)
right.set_related_dataset(left)

Explanation: Wrap each dict value in an Entry class with __getitem__ defined to make it usable as a dict (more or less). Have a Dataset class that maps primary keys to these Entrys. Provide the Entry access to this dataset and provide convenient methods .parent() and .related(). In order for .related() to work, set which dataset the "related" one should be with set_related_dataset and it all ties together.

Now you can even just pass Entrys and you'll be able to access the related entries without needing to pass both datasets in.

Upvotes: 1

jdi
jdi

Reputation: 92559

Based off of Mark Ransom's comment, maybe you can organize a class like this:

class Storage(object):

    def __init__(self):

        self._leftdataset = {
            '10000': { 'key': '10000', 'fkey':'asdf', 'parent':'10001'},
            '10001': { 'key': '10001', 'fkey':'qwer', 'parent':''}
        }

        self._rightdataset= {
            'asdf': { 'key': 'asdf', 'fkey':'10000', 'parent':'qwer'},
            'qwer': { 'key': 'qwer', 'fkey':'10001', 'parent':''}
        }

    def get(self, key):
        d1 = self._leftdataset
        d2 = self._rightdataset

        if key in d1:
            left = d1[key]
            right = d2[left['fkey']]
        else:
            right = d2[key]
            left = d1[right['fkey']]

        return left, right

And use a single lookup method:

s = Storage()

s.get('10000')
# ({'fkey': 'asdf', 'key': '10000', 'parent': '10001'},
#  {'fkey': '10000', 'key': 'asdf', 'parent': 'qwer'})


s.get('qwer')
# ({'fkey': 'qwer', 'key': '10001', 'parent': ''},
#  {'fkey': '10001', 'key': 'qwer', 'parent': ''})

Upvotes: 2

Related Questions