Ross MacArthur
Ross MacArthur

Reputation: 5459

zip-like function that fails if a particular iterator is not consumed

I would like a zip like function that fails if the right-most iterator is not consumed. It should yield until the failure.

For example

>>> a = ['a', 'b', 'c']
>>> b = [1, 2, 3, 4]

>>> myzip(a, b)
Traceback (most recent call last):
    ...
ValueError: rightmost iterable was not consumed

>>> list(myzip(b, a))
[(1, 'a'), (2, 'b'), (3, 'c')]

Perhaps there a function in the standard library that can help with this?

Important Note:

In the real context the iterators are not over objects so I can't just check the length or index them.

Edit:

This is what I have come up with so far

def myzip(*iterables):
    iters = [iter(i) for i in iterables]

    zipped = zip(*iters)

    try:
        next(iters[-1])
        raise ValueError('rightmost iterable was not consumed')
    except StopIteration:
        return zipped

Is this the best solution? It doesn't keep the state of the iterator because I call next on it, which might be a problem.

Upvotes: 1

Views: 455

Answers (4)

iGian
iGian

Reputation: 11193

Other option using zip_longest from itertools. It returns also true or false if all lists are consumed. Maybe not the most efficient way, but could be improved:

from itertools import zip_longest
a = ['a', 'b', 'c', 'd']
b = [1, 2, 3, 4, 5]
c = ['aa', 'bb', 'cc', 'dd', 'ee', 'ff']

def myzip(*iterables):
    consumed = True
    zips = []
    for zipped in zip_longest(*iterables):
      if None in zipped:
        consumed = False 
      else:
        zips.append(zipped)
    return [zips, consumed]


list(myzip(a, b, c))
#=> [[('a', 1, 'aa'), ('b', 2, 'bb'), ('c', 3, 'cc'), ('d', 4, 'dd')], False]

Upvotes: 0

Patrick Artner
Patrick Artner

Reputation: 51683

There is already a zip_longest in itertools that allows for "expansion" of the shorter iterable by a default value.

Use that and check if your default value occurs: if so, it would have been a case of "rightmost element not consumed":

class MyError(ValueError):
    """Unique "default" value that is recognizeable and allows None to be in your values.""" 
    pass

from itertools import zip_longest

isMyError = lambda x:isinstance(x,MyError)

def myzip(a,b):
    """Raises MyError if any non-consumed elements would occur using default zip()."""
    K = zip_longest(a,b, fillvalue=MyError())
    if all(not isMyError(t) for q in K for t in q): 
        return zip(a,b)
    raise MyError("Not all items are consumed") 


a = ['a', 'b', 'c', 'd']
b = [1, 2, 3, 4]
f = myzip(a, b)
print(list(f)) 
try:
    a = ['a', 'b', ]
    b = [1, 2, 3, 4]
    f = myzip(a, b)
    print(list(f)) 
except MyError as e:
    print(e)

Output:

[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
Not all items are consumed

This consumes (worst case) the full zipped list once to check and then returns it as iterable.

Upvotes: 0

Aleksi Torhamo
Aleksi Torhamo

Reputation: 6632

There's a few different ways you can go about doing this.

  1. You could use the normal zip() with an iterator and manually check that it gets exhausted.

    def check_consumed(it):
        try:
            next(it)
        except StopIteration:
            pass
        else:
            raise ValueError('rightmost iterable was not consumed')
    
    b_it = iter(b)
    list(zip(a, b_it))
    check_consumed(b_it)
    
  2. You could wrap the normal zip() to do the check for you.

    def myzip(a, b):
        b_it = iter(b)
        yield from zip(a, b_it)
        # Or, if you're on a Python version that doesn't have yield from:
        #for item in zip(a, b_it):
        #    yield item
        check_consumed(b_it)
    
    list(myzip(a, b))
    
  3. You could write your own zip() from scratch, using iter() and next().

    (No code for this one, as option 2 is superior to this one in every way)

Upvotes: 1

Eternal_flame-AD
Eternal_flame-AD

Reputation: 342

I think this one does the work by checking if the last consumer was completely consumed before returning

# Example copied from https://stackoverflow.com/questions/19151/build-a-basic-python-iterator
class Counter:
    def __init__(self, low, high):
        self.current = low
        self.high = high

    def __iter__(self):
        return self

    def __next__(self): # Python 3: def __next__(self)
        if self.current > self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1

# modified from https://docs.python.org/3.5/library/functions.html#zip
def myzip(*iterables):
    sentinel = object()
    iterators = [iter(it) for it in iterables]
    while iterators:
        result = []
        for it in iterators:
            elem = next(it, sentinel)
            if elem is sentinel:
                elem = next(iterators[-1], sentinel)
                if elem is not sentinel:
                    raise ValueError("rightmost iterable was not consumed")
                else:
                    return
            result.append(elem)
        yield tuple(result)


a = Counter(1,7)
b = range(9)

for val in myzip(a,b):
    print(val)

Upvotes: 0

Related Questions