Noctis Skytower
Noctis Skytower

Reputation: 22011

Is there any way to dry out this generator?

The following method is in my class and tries to prime itself before gettings its work done. The primer is at lazy at doing its work as the processing loop that follows it. Five lines are repeated in these two loops, and it is not obvious to me what the best approach to eliminating the repetition might be.

@classmethod
def __get_start_words(cls, iterable, n, start_words):
    iterator, buffer, sentinel = iter(iterable), Deque(maxlen=n), object()
    for _ in range(n):
        item = next(iterator, sentinel)
        if item is sentinel:
            # raise ValueError('iterable was too short to satisfy n')
            break
        buffer.append(item)
        yield item
    start_words[buffer.prefix] += 1
    while True:
        if buffer[0][-1] in cls.TERMINATORS:
            start_words[buffer.suffix] += 1
        item = next(iterator, sentinel)
        if item is sentinel:
            break
        buffer.append(item)
        yield item

Is there an effective and clear way of writing those last five lines just once in the class or method?


Addendum

In answer to the question regarding what prefix and suffix are, here is the Deque class:

class Deque(collections.deque):
    """Deque([iterable[, maxlen]]) -> Deque instance"""

    @property
    def prefix(self):
        """Property allowing capture of all but last item in deque."""
        item = self.pop()
        value = tuple(self)
        self.append(item)
        return value

    @property
    def suffix(self):
        """Property allowing capture of all but first item in deque."""
        item = self.popleft()
        value = tuple(self)
        self.appendleft(item)
        return value

Second Version

After taking into account what others had to say, the following method was written for efficiency:

@classmethod
def __get_start_words(cls, iterable, n, start_words):
    iterator, buffer, count = iter(iterable), Deque(maxlen=n), 0
    for item, count in zip(iterator, range(n)):
        buffer.append(item)
        yield item
    if count + 1 < n:
        raise ValueError('iterable was too short to satisfy n')
    start_words[buffer.prefix] += 1
    try:
        while True:
            if buffer[0][-1] in cls.TERMINATORS:
                start_words[buffer.suffix] += 1
            item = next(iterator)
            buffer.append(item)
            yield item
    except StopIteration:
        pass

Third Version

This third version of the method has been adapted from Daniel's insightful answer:

@classmethod
def __get_start_words(cls, iterable, n, start_words):
    count, buffer = 0, Deque(maxlen=n)
    for count, item in enumerate(iterable, 1):
        yield item
        buffer.append(item)
        if count == n:
            start_words[buffer.prefix] += 1
        if count >= n and buffer[0][-1] in cls.TERMINATORS:
            start_words[buffer.suffix] += 1
    if count < n:
        raise ValueError('iterable was too short to satisfy n')

Final Version

This method is significantly better than my first version -- thanks to the people who helped me here.

@classmethod
def __get_start_words(cls, iterable, n, start_words):
    buffer = Deque(maxlen=n)
    for count, item in enumerate(iterable, 1):
        yield item
        buffer.append(item)
        if count == n:
            start_words[buffer.prefix] += 1
        if count >= n and buffer[0][-1] in cls.TERMINATORS:
            start_words[buffer.suffix] += 1
    if len(buffer) < n:
        raise ValueError('iterable was too short to satisfy n')

Upvotes: 1

Views: 117

Answers (1)

Daniel
Daniel

Reputation: 42758

Use for-loop:

@classmethod
def __get_start_words(cls, iterable, n, start_words):
    buffer = Deque(maxlen=n)
    for idx, item in enumerate(iterable, 1):
        buffer.append(item)
        yield item
        if idx == n:
            start_words[buffer.prefix] += 1
        if idx >= n and buffer[0][-1] in cls.TERMINATORS:
            start_words[buffer.suffix] += 1
    if len(buffer) < n:
        raise ValueError('iterable was too short to satisfy n')

Some thoughts on your second version: count is not necessary when using islice:

for item in islice(iterator, n):
    buffer.append(item)
    yield item
if len(buffer) < n:
    raise ValueError('iterable was too short to satisfy n')

Further refactoring leading to:

@classmethod
def __get_start_words(cls, iterable, n, start_words):
    iterable = iter(iterable)
    buffer = deque(islice(iterable, n-1))
    yield from buffer
    if len(buffer) < n - 1:
        raise ValueError('iterable was too short to satisfy n')
    start_words[tuple(buffer)] += 1
    for item in iterable:
        buffer.append(item)
        yield item
        first = buffer.popleft()
        if first[-1] in cls.TERMINATORS:
            start_words[tuple(buffer)] += 1

Upvotes: 5

Related Questions