Steve Zelaznik
Steve Zelaznik

Reputation: 616

Python set intersections, any way to return elements from the larger set?

When Python takes the intersection of two sets, it always returns elements from the smaller one, which is reasonable in nearly all cases, but I'm trying to do the opposite.

In the piece of code below, note that the intersection yields an integer, not a float.

[in]  >>> x = {1.0,2.0,3.0}
[in]  >>> y = {1}
[in]  >>> x.intersection(y)
[out] >>> {1}
[in]  >>> y.intersection(x)
[out] >>> {1}

If I want to get a float back, I have to use some heavy copying.

[in]  >>> x - y
[out] >>> {2.0,3.0}
[in]  >>> x - (x - y)
[out] >>> {1.0}

I'm dealing with much larger sets than the example above. My question is whether there's any way to trick Python set.intersection method into returning elements from the larger set, or if there another method that can return the float 1.0 besides what I've done here.

The reason why I'm doing this in the first place is I'm trying to implement a frozen dictionary in pure python by sub-classing frozenset. I'm storing the key-value pairs using a subclass of tuple I call "Item" where hash returns the hash of the key only. Using the code below, I'm able to create a set with a single key-value pair inside of it. Then I extract the attribute "value" and return it.

def __getitem__(self, key):
    wrapped = Item((key,),flag=False)
    if not frozenset.__contains__(self, wrapped):
        raise KeyError(key)
    matches = self - (self - {wrapped})
    for pair in frozenset.__iter__(matches):
        return pair.value

I know that the copying is the reason for the slowness because when I try to return an item whose key is not in the dictionary, I get a KeyError immediately, even for sets with 10 million items.

Upvotes: 2

Views: 1545

Answers (1)

mgilson
mgilson

Reputation: 309929

At the risk of answering something different than what you actually asked for (but maybe helping with the end-goal)... a Frozen dict is actually really easy to implement in python:

from collections import Mapping


class FrozenDict(Mapping):

    def __init__(self, *args, **kwargs):
        self._hash = None  # defer calculating hash until needed.
        self._data = dict(*args, **kwargs)

    def __getitem__(self, item):
        return self._data[item]

    def __len__(self):
        return len(self._data)

    def __iter__(self):
        return iter(self._dict)

    def __repr__(self):
        return '{}({!r})'.format(type(self), self._data)

    def __hash__(self):
        if self._hash is not None:
            return self._hash
        # Only hashible if the items are hashible.              
        self._hash = hash(tuple(self.items()))


x = FrozenDict({'a': 'b'})
print x
x['c'] = 'Bad Bad Bad'

Of course, this isn't truly frozen (in the same sense that a frozenset is frozen). A user could reach in and modify the data on the frozendict -- But then they deserve any code breakages that they cause.


To answer your actual question, the only alternative that I can think of is to define your own intersection function:

>>> s1 = set([1])
>>> s2 = set([1., 2.])
>>> def intersection(s1, s2):
...     return set(x for x in s1 if x in s2)
... 
>>> intersection(s1, s2)
set([1])
>>> intersection(s2, s1)
set([1.0])

This one always returns sets, but you could easily modify to return frozenset or the type of the input if you make the assumption that the type of the first input has a constructor that accepts only an iterable:

def intersection(s1, s2):
    output_type = type(s1)
    return output_type(x for x in s1 if x in s2)

Upvotes: 1

Related Questions