Reputation: 3
Suppose I want to create an iterator class that takes another iterator as an input and counts the frequency of the elements. I cannot use lists, dicts, or any other data structure that could be used to store information for multiple elements together so I have to solve this by creating some sort of a nested iteration. Conceptually, I want my class to do the following:
for i in iter(input):
count=0
for j in iter(input):
if i=j:
count+=1
if j = None: #iterator is done
# reset j
# move to next element of i
This is obviously a simplified example in many ways but I hope the general intended structure of my class is clear. Each time I save the value of i and the count to disk but we can ignore that for now.
The first problem I come across is that Python does not allow iterators to be reset once they are consumed which creates an issue with resetting the inner loop (j). I overcome this using itertools.cycle()
when initiating the second iterator which allows endless iteration. Unfortunately, my code below only does one succesfull pass over the data and the first if statement does not return the next value of the outer iterator but instead treats it as if it has been already consumed.
class Match:
def __init__(self, input):
'''
Input is an iterator
'''
self.in1 = input
self.in2 = input
self.c=0 #the count
def __iter__(self):
'''
Initializes the iterators and fetches the first element
'''
self.it1 = iter(self.in1) # initialize the first (outer) iterator
self.it2 = itertools.cycle(iter(self.in2)) # initialize the second (inner) iterator
self.i = next(self.it1) #pin the first elements
self.j = next(self.it2)
return self
def _inc_outter_end(self):
'''increment or end outer iterator'''
try:
self.i = next(self.it1)
except StopIteration:
self.i = None
self.j = None
def __next__(self):
i = self.i
j = self.j
self.j = next(self.it2)
self.c+=1
if self.c ==9:
self.c=0
self._inc_outter_end()
i = self.i
#stop if done
elif i == None:
raise StopIteration()
#skip non-pairs
elif i != j:
return self.__next__()
#count and return matches
elif i==j:
return self.c
Running something like:
i1 = [1,7,2,4,6,6,1,1,3]
for match in Match(iter(i1)):
print(match)
does one pass over the data such that i is always 1 but instead of doing 8 more passes (for all the next elements of the input) stops. Instead I would like it to return the same output as:
i1 = [1,7,2,4,6,6,1,1,3]
for i in i1:
count=0
for j in i1:
if i==j:
count+=1
print(i,count)
giving
1 3
7 1
2 1
4 1
6 2
6 2
1 3
1 3
3 1
Upvotes: 0
Views: 1373
Reputation: 92440
Here is a way you can count the elements in the list just using iterators. This prints one count for each element rather than repeating the counts for the repeated elements. If you really need to report the other way, maybe this will give you an idea.
The basic plan is to loop through the elements and count the target if they match. If they don't match, yield the elements back. The countFirst
function is both a counter and a generator that provides the non-target values:
def countTarget(it, target):
count = 1
for i in it:
if i == target:
count += 1
else:
yield i
print(f"number: {target} count: {count}")
def count(it):
while True:
try:
target = next(it)
except StopIteration:
return
it = countTarget(it, target)
orig = iter([1,7,2,4,6,6,1,1,3])
count(orig)
Prints:
number: 1 count: 3
number: 7 count: 1
number: 2 count: 1
number: 4 count: 1
number: 6 count: 2
number: 3 count: 1
Of course this is not especially efficient — you loop over the iterator once for every unique value. But it seems like more of a through exercise than a practical exercise.
Upvotes: 0
Reputation: 131600
It seems like, for each element in the input iterator, you want to emit the number of times that element is yielded by the entire iterator. Obviously, you can't compute that number until you have fully exhausted the iterator. And that means that, whatever solution you come up with, it must involve somehow storing some information about the elements of the iterator in such a way that information about all the elements is stored at the same time.
However, you also say
I cannot use lists, dicts, or anything else [...]
Now it's not clear what you mean by that (specifically, what does "anything else" refer to exactly?), but one might naturally take this to mean anything you can use to store information about all elements of the iterator at the same time is off limits. If that is your situation, then this task is not possible. You'll have to relax one of your constraints or find a way around doing this altogether.
If what I've posted here is not the correct interpretation of "lists, dicts, or anything else" for your situation, then you'll have to clarify what you mean by that, and perhaps with more clarity a solution will present itself.
Someone may raise the objection that you can do this with itertools.tee()
, which basically copies an iterator and lets you iterate over it twice (or a number of times of your choice). But the underlying implementation of tee()
is effectively equivalent to storing the contents of the iterator in a list, which I assume is ruled out by your condition. (tee()
can actually be more efficient than a list, in that it can store only part of the iterator, not the entire thing, if your usage pattern allows it. But that's not the case here; the task you're trying to undertake requires storing information about the entire iterator.)
Upvotes: 1