Chris Brendel
Chris Brendel

Reputation: 700

Set comprehensions in Python and testing for membership in the set being created

This is actually question about the semantics of set comprehensions, but I first need to explain the context. I'm trying to create a new set of tuples where the paired value in the touple is unique regardless of the order of the values in the pair. Simplifying my actual program, what I have is something like {(1, 2), (2, 1), (3, 4)} and I'd like to get {(1, 2), (3, 4)}

I tried doing something like this:

oldSet = {(1, 2), (2, 1), (3, 4)}

newSet = set()
newSet = {(val1, val2) for (val1, val2) in oldSet if not (val2, val1) in newSet}

However, newSet is {(1, 2), (2, 1), (3, 4)}, implying that something is wrong with my conditional expression. My understanding of comprehensions suggests that the above is syntactic sugar for something like this:

newSet =  set()
for (val1, val2) in oldSet:
  if not (val2, val1) in newSet:
    newSet.add((val1, val2))

This traditional looping structure works (newSet is {(1, 2), (3, 4)}). Is there something about comprehensions that causes the conditional to be evaluated before newSet has any members? I'm fairly new to Python, so I'm wondering if there's something subtle I'm missing.

Thanks!

Upvotes: 2

Views: 92

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1125398

You misunderstood; the set comprehension is a distinct expression, separate from the assignment. The expression produces a new set() object that then will be assigned to newSet, replacing the old set() object you had.

As such, as you iterate and build the set, the previous and separate set() object bound to newSet remains empty. In reality, the set comprehension does this:

newSet = set()
_result = set()
for (val1, val2) in oldSet:
    if not (val2, val1) in newSet:
        result.add((val1, val2))
newSet = _result

You could use side effects to alter a separate set as you iterate:

seen = set()
newSet = {(val1, val2) for (val1, val2) in oldSet
          if not ((val2, val1) in seen or seen.add((val1, val2))}

This uses seen to track what has already been processed and includes a tuple if both conditions are true:

  • the inverse has not already been seen before,
  • the seen.add() operation for the tuple returns a false value. Since seen.add() always returns None, that is always going to be the case.

Note that this now builds the same set twice, so you may as well do a regular loop and be done with it:

newSet = set()
for (val1, val2) in oldSet:
    if not (val2, val1) in newSet:
        newSet.add((val1, val2))

Since your tuples are consist of two values only, you may as well use sorting here; any pair of tuples (a, b), (b, a) has one unique sorting, after all:

newSet = {tuple(sorted(t)) for t in oldSet}

Upvotes: 5

Pynchia
Pynchia

Reputation: 11606

A working alternative is:

newSet = { tuple(sorted(t)) for t in oldSet }

Your solution is checking the presence of the tuple in the set that is being generated, but the name is not bound to the value yet. It will when the comprehension terminates.

Upvotes: 2

Related Questions