Reputation: 4930

How do I do advanced Python hash autovivification?

This question is about implementing the full Perl autovivification in Python. I know similar questions were asked before and so far the best answer is in "What is the best way to implement nested dictionaries in Python?" . However, I'm looking to do this:

a['x']['y'].append('z')

without declaring a['x']['y'] = [] first, or rather, not declaring a['x'] = {} either. (Note in Perl you can do push @{$a->{x}{y}}, 'z';.)

I know dict and list classes sorta don't mix, so this is hard, but I'm interested in seeing if someone has an ingenious solution probably involving creating an inherited class from dict but defined a new append method on it?

I also know this might throw off some Python purists who will ask me to stick with Perl. But, even just for a challenge, I'd like to see something.

Upvotes: 11

Answers (5)

snoopyjc

Reputation: 623

Here is my version, which I call ArrayHash. It behaves like an array unless you use a string as a key, then it starts behaving like a hash. It supports initialization, concat with + and +=, and slicing both on the LHS and RHS. In array mode, if you reference or assign to arrayhash[N], it fills in from 0..N-1 with None values. In hash mode it stops doing that. It answers isinstance(arrayhash, dict) with True, and isinstance(arrayhash, collections.abc.Sequence) gives True even if it's in hash mode. I'd love to get your feedback!

from collections import defaultdict
import collections.abc

class _ArrayHash(defaultdict, collections.abc.Sequence):
    def append(self, value):
        self[len(self)] = value

    def extend(self, lst):
        ln = len(self)
        for item in lst:
            self[ln] = item
            ln += 1

    def update(self, values):
        self.isHash = True
        for k, v in values.items():
            self[k] = v

    def __getitem__(self, index):
        if isinstance(index, slice):
            return [self[i] for i in range(*index.indices(len(self)))]
        elif isinstance(index, int):
            if index < 0:
                index += len(self)
            return super().__getitem__(index)
        else:
            self.isHash = True
            return super().__getitem__(index)

    def __setitem__(self, index, value):
        if isinstance(index, slice):
            try:
                if not hasattr(self, 'isHash'):
                    for i in range(len(self), index.start):
                        super().__setitem__(i, None)
                value = iter(value)
                ndx = index.start
                for i in range(*index.indices(len(self))):
                    super().__setitem__(i, next(value))
                    ndx += 1
                rest = list(value)
                lr = len(rest)
                if lr:
                    for i in range(len(self)-1,ndx-1,-1):  # Move everything else up
                        super().__setitem__(i+lr, super().__getitem__(i))
                for i in range(lr):
                    super().__setitem__(i+ndx, rest[i])
            except StopIteration:
                pass
        elif isinstance(index, int):
            if index < 0:
                index += len(self)
            if not hasattr(self, 'isHash'):
                for i in range(len(self), index):
                    super().__setitem__(i, None)
            super().__setitem__(index, value)
        else:
            self.isHash = True
            super().__setitem__(index, value)

    def __iter__(self):
        if hasattr(self, 'isHash'):
            for i in self.keys():
                yield i
        else:
            for i in range(len(self)):
                yield self[i]

    def __add__(self, other):
        result = ArrayHash(self)
        if hasattr(self, 'isHash') or (isinstance(other, dict) and not isinstance(other, _ArrayHash)) or hasattr(other, 'isHash'):
            result.update(other)
        else:
            result.extend(other)
        return result

    def __iadd__(self, other):
        if hasattr(self, 'isHash') or (isinstance(other, dict) and not isinstance(other, _ArrayHash)) or hasattr(other, 'isHash'):
            self.update(other)
        else:
            self.extend(other)
        return self

    def __radd__(self, other):
        result = ArrayHash()
        if hasattr(self, 'isHash') or (isinstance(other, dict) and not isinstance(other, _ArrayHash)) or hasattr(other, 'isHash'):
            result.update(other)
            result.update(self)
        else:
            result.extend(other)
            result.extend(self)
        return result


def ArrayHash(init=None):
    """Acts like an array with autovivification, unless you use a string as a key, then it becomes a hash"""
    result = _ArrayHash(ArrayHash)
    if init is not None:
        if isinstance(init, collections.abc.Sequence) and not isinstance(init, str):
            result.extend(init)
        elif isinstance(init, _ArrayHash):
            if(hasattr(init, 'isHash')):
                result.update(init)
            else:
                result.extend(init)
        elif isinstance(init, dict):
            result.update(init)
        else:
            result.append(init)
    return result

Upvotes: 0

ncoghlan

Reputation: 41496

Just expanding on Ignacio's answer to present some additional options that allow you to explicitly request more magical behaviour from Python dictionaries. The maintainability of code written this way remains dubious, but I wanted to make it crystal clear that it is a question of "Is this kind of data structure maintainable?" (I have my doubts) not "Can Python be made to behave this way?" (it certainly can).

To support arbitrary levels of nesting for the namespace aspect, all you have to do is name the function (instead of using lambda) and make it self-referential:

>>> from collections import defaultdict
>>> def autodict(): return defaultdict(autodict)
...
>>> a = autodict()
>>> a[1][2][3] = []
>>> type(a[1])
<class 'collections.defaultdict'>
>>> type(a[1][2])
<class 'collections.defaultdict'>
>>> type(a[1][2][3])
<class 'list'>
>>> a[1][2][3]
[]

However, this introduces the "problem" that you have to set a list explicitly before you can append to it. Python's answer to that lies in the setdefault method, which has actually been around longer than collections.defaultdict:

>>> a.setdefault(3, []).append(10)
>>> a.setdefault(3, []).append(11)
>>> a[3]
[10, 11]
>>> a[2].setdefault(3, []).append(12)
>>> a[2].setdefault(3, []).append(13)
>>> a[2][3]
[12, 13]
>>> a[1][2].setdefault(3, []).append(14)
>>> a[1][2].setdefault(3, []).append(15)
>>> a[1][2][3]
[14, 15]

All collections.defaultdict really does is make the common case where you always pass the same second parameter to dict.setdefault much easier to use. For more complex cases, such as this one, you can still use dict.setdefault directly for the aspects that collections.defaultdict can't handle.

Upvotes: 2

Spencer Rathbun

Reputation: 14900

Since we don't know beforehand if we need a dictionary, or a list, then you can't combine autovivification with lists. Unless, Working off of Nosklo's answer from the linked question, you add list "features" to the underlying dictionary. Basically assuming a "sort" order for keys, and always using it with the list methods. I've done this as an example:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature. Has features from both dicts and lists,
    dynamically generates new subitems as needed, and allows for working (somewhat) as a basic type.
    """
    def __getitem__(self, item):
        if isinstance(item, slice):
            d = AutoVivification()
            items = sorted(self.iteritems(), reverse=True)
            k,v = items.pop(0)
            while 1:
                if (item.start < k < item.stop):
                    d[k] = v
                elif k > item.stop:
                    break
                if item.step:
                    for x in range(item.step):
                        k,v = items.pop(0)
                else:
                    k,v = items.pop(0)
            return d
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

    def __add__(self, other):
        """If attempting addition, use our length as the 'value'."""
        return len(self) + other

    def __radd__(self, other):
        """If the other type does not support addition with us, this addition method will be tried."""
        return len(self) + other

    def append(self, item):
        """Add the item to the dict, giving it a higher integer key than any currently in use."""
        largestKey = sorted(self.keys())[-1]
        if isinstance(largestKey, str):
            self.__setitem__(0, item)
        elif isinstance(largestKey, int):
            self.__setitem__(largestKey+1, item)

    def count(self, item):
        """Count the number of keys with the specified item."""
        return sum([1 for x in self.items() if x == item])

    def __eq__(self, other):
        """od.__eq__(y) <==> od==y. Comparison to another AV is order-sensitive
        while comparison to a regular mapping is order-insensitive. """
        if isinstance(other, AutoVivification):
            return len(self)==len(other) and self.items() == other.items()
        return dict.__eq__(self, other)

    def __ne__(self, other):
        """od.__ne__(y) <==> od!=y"""
        return not self == other

This follows the basic autovivification feature of dynamically generating itself for dud keys. However, it also implements some of the methods listed here. This allows it to act like a quasi list/dict thing.

For the rest of the list features, add in the listed methods. I'm treating it as a dictionary with the list methods. If a list method is called, then it makes an assumption about the order of the items held, namely that strings sort lower than integers, and that keys are always in "sorted" order.

It also supports adding, as an example of these methods. This comes from my own use case. I needed to add items from a AutoVivified dictionary, but if it doesn't exist, a new AutoVivification object is created and returned. They have no integer "value" and so you can't do this:

rp = AutoVivification()
rp['a']['b'] = 3
rp['a']['b'] + rp['q']

That defeats the purpose, since I don't know if some thing is going to be there, but I want a default anyway. So I've added the __add__ and __radd__ methods to it. They use the length of the underlying dictionary as the integer value, so a newly created AV object has a value of zero for addition. If a key has something besides an AV object in it, then we get that thing's addition method, if implemented.

Upvotes: 2

tzot

Reputation: 95971

Perhaps this solves your need for any number of “dimensions” in your dictionary:

a= collections.defaultdict(list)

the only change in your code being:

a['x', 'y'].append('z')

Of course, this solution being the one you want depends on two conditions:

whether you need to easily access all lists with e.g. 'x' in the “first dimension“
whether you are stuck to the way Perl magically pleases you more :)

If either of these two conditions is true, my solution won't help you.

Upvotes: 2

Ignacio Vazquez-Abrams

Reputation: 798744

a = collections.defaultdict(lambda: collections.defaultdict(list))

Upvotes: 16

How do I do advanced Python hash autovivification?

Answers (5)

Related Questions