Reputation: 22011
The following method is in my class and tries to prime itself before gettings its work done. The primer is at lazy at doing its work as the processing loop that follows it. Five lines are repeated in these two loops, and it is not obvious to me what the best approach to eliminating the repetition might be.
@classmethod
def __get_start_words(cls, iterable, n, start_words):
iterator, buffer, sentinel = iter(iterable), Deque(maxlen=n), object()
for _ in range(n):
item = next(iterator, sentinel)
if item is sentinel:
# raise ValueError('iterable was too short to satisfy n')
break
buffer.append(item)
yield item
start_words[buffer.prefix] += 1
while True:
if buffer[0][-1] in cls.TERMINATORS:
start_words[buffer.suffix] += 1
item = next(iterator, sentinel)
if item is sentinel:
break
buffer.append(item)
yield item
Is there an effective and clear way of writing those last five lines just once in the class or method?
Addendum
In answer to the question regarding what prefix
and suffix
are, here is the Deque
class:
class Deque(collections.deque):
"""Deque([iterable[, maxlen]]) -> Deque instance"""
@property
def prefix(self):
"""Property allowing capture of all but last item in deque."""
item = self.pop()
value = tuple(self)
self.append(item)
return value
@property
def suffix(self):
"""Property allowing capture of all but first item in deque."""
item = self.popleft()
value = tuple(self)
self.appendleft(item)
return value
Second Version
After taking into account what others had to say, the following method was written for efficiency:
@classmethod
def __get_start_words(cls, iterable, n, start_words):
iterator, buffer, count = iter(iterable), Deque(maxlen=n), 0
for item, count in zip(iterator, range(n)):
buffer.append(item)
yield item
if count + 1 < n:
raise ValueError('iterable was too short to satisfy n')
start_words[buffer.prefix] += 1
try:
while True:
if buffer[0][-1] in cls.TERMINATORS:
start_words[buffer.suffix] += 1
item = next(iterator)
buffer.append(item)
yield item
except StopIteration:
pass
Third Version
This third version of the method has been adapted from Daniel's insightful answer:
@classmethod
def __get_start_words(cls, iterable, n, start_words):
count, buffer = 0, Deque(maxlen=n)
for count, item in enumerate(iterable, 1):
yield item
buffer.append(item)
if count == n:
start_words[buffer.prefix] += 1
if count >= n and buffer[0][-1] in cls.TERMINATORS:
start_words[buffer.suffix] += 1
if count < n:
raise ValueError('iterable was too short to satisfy n')
Final Version
This method is significantly better than my first version -- thanks to the people who helped me here.
@classmethod
def __get_start_words(cls, iterable, n, start_words):
buffer = Deque(maxlen=n)
for count, item in enumerate(iterable, 1):
yield item
buffer.append(item)
if count == n:
start_words[buffer.prefix] += 1
if count >= n and buffer[0][-1] in cls.TERMINATORS:
start_words[buffer.suffix] += 1
if len(buffer) < n:
raise ValueError('iterable was too short to satisfy n')
Upvotes: 1
Views: 117
Reputation: 42758
Use for
-loop:
@classmethod
def __get_start_words(cls, iterable, n, start_words):
buffer = Deque(maxlen=n)
for idx, item in enumerate(iterable, 1):
buffer.append(item)
yield item
if idx == n:
start_words[buffer.prefix] += 1
if idx >= n and buffer[0][-1] in cls.TERMINATORS:
start_words[buffer.suffix] += 1
if len(buffer) < n:
raise ValueError('iterable was too short to satisfy n')
Some thoughts on your second version: count
is not necessary when using islice
:
for item in islice(iterator, n):
buffer.append(item)
yield item
if len(buffer) < n:
raise ValueError('iterable was too short to satisfy n')
Further refactoring leading to:
@classmethod
def __get_start_words(cls, iterable, n, start_words):
iterable = iter(iterable)
buffer = deque(islice(iterable, n-1))
yield from buffer
if len(buffer) < n - 1:
raise ValueError('iterable was too short to satisfy n')
start_words[tuple(buffer)] += 1
for item in iterable:
buffer.append(item)
yield item
first = buffer.popleft()
if first[-1] in cls.TERMINATORS:
start_words[tuple(buffer)] += 1
Upvotes: 5