Reputation: 13135
Suppose I have code along these lines:
counter = Counter()
text = f.read()
words = words_generator(text)
interesting_words = filter_generator(words)
counter.update(interesting_words)
for i in counter:
print("Frequency for "+i ": "+counter[i]/sum)
How should I best set the value of sum
which is the number of values yielded by words_generator
?
Upvotes: 1
Views: 212
Reputation: 60147
from collections import Counter
class CountItemsWrapper:
def __init__(self, items):
self.items = iter(items)
self.count = 0
def __next__(self):
res = next(self.items)
self.count += 1
return res
def __iter__(self):
return self
counter = Counter()
text = f.read()
words = CountItemsWrapper(words_generator(text))
interesting_words = filter_generator(words)
counter.update(interesting_words)
for i in counter:
print("Frequency for "+i ": "+counter[i]/words.count)
Basically, CountItemsWrapper
is an iterator that just passes through values, but keeps count whenever it does.
You can then just use the count
attribute on the wrapper as your sum
.
Explanation of the class:
def __init__(self, items):
self.items = iter(items)
self.count = 0
This is simple. Keep in mind that instances are iterators, not just iterables. So this iterates once, keeping count once.
def __next__(self):
res = next(self.items)
self.count += 1
return res
This is called to get the next item.self.count
must be added after the call to next
because we allow the StopIteration to propagate and don't want to add to the count if we haven't yielded a value.
def __iter__(self):
return self
This is an iterator so it returns itself.
Upvotes: 4
Reputation: 101959
The simplest solution is to build a list:
words = list(words_generator(text))
An other option is to use itertools.tee
:
words, words_copy = itertools.tee(words_generator(text))
Afterwards you can use both copy of the iterable. However note that if you first iterate completely over a copy then it will be faster and more memory efficient to simply build the list. To see any gain memory-wise you should somehow iterate on both copies "at the same time". For example something like:
filtered = filter_generator(words)
total = 0
for word, _ in zip(filtered, words_copy): # use itertools.izip in python2
counter[word] += 1
total += 1
total += sum(1 for _ in words_copy)
Which uses at most O(n-k)
memory where n
is the number of words in the text and k
is the number of interesting words in the text. You may simplify the code a bit using:
from itertools import zip_longest #izip_longest in python2
filtered = filter_generator(words)
total = 0
for word, _ in zip_longest(filtered, words_copy):
counter[word] += 1
total += 1
del counter[None]
Which uses only O(1)
memory(if the generators are constant-space).
Note however that having explicit loops will slow down the code, so in the end, if memory is not an option, building a list
for words
may be the better solution.
Upvotes: 0
Reputation: 77912
Q&D posssible technical solution : wrap your generator into an iterable that keeps track of the number of items seens, ie:
class IterCount(object):
def __init__(self, iterable):
self._iterable = iterable
self._count = 0
def _itercount(self):
for value in self._iterable:
self._count += 1
yield value
def __iter__(self):
return self._itercount()
@property
def count(self):
return self._count
itc1 = IterCount(range(10))
print list(itc1)
print itc1.count
itc2 = IterCount(xrange(10))
print list(itc2)
print itc2.count
Upvotes: 2