Just Me
Just Me

Reputation: 431

"Multiton" implementation of frozensets - just one instance per value

How can I implement the Multiton design pattern for frozensets, in a way which works no matter how the frozenset is created?

What I'm looking for is class that behaves just like frozenset, but that gaurantees "full interning": for any two instances, if a == b then a is b.

The answers to this question seem to produce a single instance for every argument passed to the constructor (and also seem to assume that these are hashable). But a given frozenset may be constructed in many different ways: the constructor may get tuples with different ordering of the elements, or unhashable lists; or you might use some operator such as a.union(b) to create the frozenset, etc.

The motivation is, naturally - trying to save memory. I have a graph with many vertices labeled by (among other things) recurring frozensets. The graph is "grown" by creating new vertices from old, and the new frozensets are obtained by adding or removing elements from the old ones.

Many thanks!

Upvotes: 2

Views: 274

Answers (2)

jferard
jferard

Reputation: 8180

You can use the __new__ method to create a wrapper around frozenset. I quote the doc:

new() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It is also commonly overridden in custom metaclasses in order to customize class creation.

The idea is to cache every wrapper created and to return always the same instance for the same frozenset.

There is a little trick: elements of the frozenset that are themselves frozensets, they should be wrapped too.

class FrozenSetWrapper:
    _cache = {}

    def __new__(cls, iterable=[]):
        fs = frozenset(FrozenSetWrapper(e) if isinstance(e, frozenset) else e
                        for e in iterable) # wrap recursively
        fsw = FrozenSetWrapper._cache.get(fs)
        if fsw is None: # was not in cache
            fsw = super(FrozenSetWrapper, cls).__new__(cls) # create an object
            fsw._fs = fs # init
            fsw.__doc__ = fs.__doc__
            FrozenSetWrapper._cache[fs] = fsw # cache
        return fsw # and return

Examples:

f1 = FrozenSetWrapper([1,2,3])
f2 = FrozenSetWrapper([1,2,3])

print(f1, f2)
# <__main__.FrozenSetWrapper object at 0x7f7894f2fa90> <__main__.FrozenSetWrapper object at 0x7f7894f2fa90>

Now, we have to reimplement the methods of frozenset to get a perfect match. This is easy for some of them: just delegate the work to the wrapped frozenset:

def __repr__(self):
    return self._fs.__repr__()

def __iter__(self):
    return self._fs.__iter__()

...

But for some methods, you have to handle frozenset and FrozenSetWrapper:

def __contains__(self, e):
    elif isinstance(e, frozenset):
        e = FrozenSetWrapper(e)

    return self._fs.contains(e)

def __eq__(self, other):
    if isinstance(other, FrozenSetWrapper):
        return self is other
    elif isinstance(other, frozenset)
        return self._fs == other
    else:
        return False

...

Or the return type:

def __and__(self, other):
    if isinstance(other, FrozenSetWrapper):
        return FrozenSetWrapper(self._fs.__and__(other._fs))
    elif isinstance(other, frozenset):
        return FrozenSetWrapper(self._fs.__and__(other))
    else:
        raise TypeError("unsupported operand type(s) ...")

The idea of interning makes sense, but the implementation may be tricky because of the edge cases.

Upvotes: 0

Just Me
Just Me

Reputation: 431

Here's one possible solution.

class Uniquifier(object) :
    """
    This class accepts an immutable object and returns a "canonical" instance of it, if one exists, or keeps it in store as that canonical instance. This is used as a kind of clunky multiton implementation.

    Of course instances are not destroyed until this object is destroyed.
    """
    def __init__(self):
        self._universe = {}


    def uniquify(self, item):
        try :
            return self._universe[item]
        except KeyError :
            self._universe[item] = item
            return self._universe[item]

Running this:

a = frozenset([3,5])
b = frozenset([5,3])
c = frozenset([3]).union([5])
print a==b, b==c
print a is b, b is c, a is c

results in:

True True
False False False

But this:

universe = Uniquifier()
a = universe.uniquify(frozenset([3,5]))
b = universe.uniquify(frozenset([5,3]))
c = universe.uniquify(frozenset([3]).union([5]))
print a == b, b==c
print a is b, b is c, a is c

gives

True True
True True True

as desired.

I was hoping that using some pythonic magic one could hide the Uniquifier logic "under the hood", but I guess this has the advantage of being straightforward and transparent.

Upvotes: 0

Related Questions