roskakori
roskakori

Reputation: 3336

How to detect that two Python iterators yield the same items?

Is there a concise and memory efficient way to find out whether two iterators lines1 and lines2 yield the same items?

For example, these iterators could be lines retrieved from a file object:

with io.open(`some.txt`, 'r', encoding='...') as lines1:
  with io.open(`other.txt`, 'r', encoding='...') as lines2:
    lines_are_equal = ...

Intuitively one could expect that

lines_are_equal = lines1 == lines2  # DOES NOT WORK

would give the desired result. However this will always be False because it only compares the addresses of the iterators instead of the items yielded.

If memory would not be an issue, one could convert the iterators to lists and compare them:

lines_are_equal = list(lines1) == list(lines2)  # works but uses a lot of memory

I already checked itertools, expecting to find something like

lines_are_equal = itertools.equal(lines1, lines2)  # DOES NOT WORK

but there does not seem to be any function like that.

The best I could come up so far is looping over both iterators using itertools.zip_longest() (Python 2: izip_longest()):

lines_are_equal = True
for line1, line2 in itertools.zip_longest(lines1, lines2):
  if line1 != line2:
    lines_are_equal = False
    break

This does give the desired result and is memory efficient however it feels clumsy and unpythonic.

Is there a better way to do this?

Solution: Applying the collective wisdom from the comments and answer this is the one line helper function that works even if the two iterators are the same or can have trailing None values:

def iter_equal(items1, items2):
  '''`True` if iterators `items1` and `items2` contain equal items.'''
  return (items1 is items2) or \
          all(a == b for a, b in itertools.zip_longest(items1, items2, fillvalue=object()))

You still have to make sure the iterators do not have side effects on each other.

Upvotes: 3

Views: 1862

Answers (1)

falsetru
falsetru

Reputation: 369244

How about using all with generator expression:

lines_are_equal = all(a == b for a, b in itertools.zip_longest(lines1, lines2))

UPDATE If iterable can produce trailing None, it's better to specify fillvalue=object() as user2357112 commented. (by default None is used for fill values)

lines_are_equal = all(a == b for a, b in
                      itertools.zip_longest(lines1, lines2, fillvalue=object()))

If your purporse is to compare two files, not any iterables, you can use filecmp.cmp instead:

files_are_equal = filecmp.cmp(filename1, filename2)

Upvotes: 5

Related Questions