Neil
Neil

Reputation: 320

In Python, why are the time complexities different between set operations union, intersection, and symmetric difference?

According to the python wiki these are the average case time complexities for the following set operations between 2 sets s and t:

The time complexities for intersection and difference make sense to me, but I don't understand why the time complexities for union and symmetric difference aren't the same as the time complexity for intersection (O(min(s, t)).

If we called s the smaller set and t the larger set, then wouldn't the following logic for union and symmetric difference both have time complexities of (O(min(s, t))? If this is not how these 2 operations are done under the hood in python then why is that?

# Symmetric Difference
for element in s:
  if element in t:
    t.remove(element)
  else:
    t.add(element)
return t

# Union
for element in s:
  t.add(element)
return t

Upvotes: 2

Views: 1870

Answers (1)

abc
abc

Reputation: 11929

Union

Consider two sets s and t. In order to build a new set which represent the union of s and t you need to iterate over them. This results in a time complexity of O(len(s) + len(t)).

def union(s, t):
   """Simple example for union, ignoring error-handling on inputs, etc."""
   result = set(s)   # copy: O(len(s))
   for el in t:      # iterate over t: O(len(t))
      result.add(el) # ignoring collisions, O(1) amortized time
   return result

Symmetric Difference

def symmetric_difference(s, t):
   """Simple example for symmetric difference, ignoring error-handling on inputs, etc."""
   result = set(t)        # copy: O(len(t))
   for el in s:           # iterate over s: O(len(s))
      if el not in t:    
        result.add(el)
      else:
        result.remove(el)
   return result

What CPython does, is to start from a copy of t, then iterate over s and add or remove element from the output set according to the result of the lookup.
Also in this case assuming the amortized time complexity for the lookup to be O(1), the resulting time complexity should be O(len(s) + len(t)), as for the union.

The table indicates a different time complexity for the average time complexity of the symmetric difference as O(s) and the reason might be that they ignore the time complexity of the make_new_set function (which builds a new set starting from t).

Upvotes: 3

Related Questions