Reputation: 4685
I'm trying to implement kind of merge sort using a python generator to find the minimum number among generated numbers and generate the next one and here is my sample code:
class GeneratorSort():
def __init__(self, *args):
self.values = [(arg.next(), i) for i, arg in enumerate(args)]
self.generators = args
def generate(self):
r, index = min(self.values)
self.values[index] = self.generators[index].next()
yield r
def t(l):
for each in l:
yield each
l1 = [2, 5, 6, 8]
l2 = [1, 4, 5, 7]
l3 = [0, 3, 9, 10]
a = GeneratorSort(t(l1), t(l2), t(l3))
But when I try to print sort results I got only 0
and next time an error:
>>> for i in a.generate():
print i
0
And here is the error:
>>> a.generate()
<generator object generate at 0x7fa7bcc37a00>
>>> a.generate().next()
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
a.generate().next()
File "/home/hamid/projects/bfl/workspace/testo.py", line 10, in generate
r, index = min(self.values)
TypeError: 'int' object is not iterable
>>>
I expect from this function to print numbers like 1
,2
,3
,4
,5
and ... sorted. Is there any other way?
Notice that I need the use of generators.
Upvotes: 3
Views: 867
Reputation: 4366
I wrote this simple code using the idea of heapq.merge
from Martijn Pieters
import heapq
def g1():
for i in range(0, 30, 5):
yield i
def g2():
for i in range(15, 25, 2):
yield i
def g3():
for i in range(5, 30, 3):
yield i
result_gen = heapq.merge(
g1(),
g2(),
g3(),
)
## convert it to list
print list(result_gen)
## or simply iterate over it
for x in result_gen:
print x
Upvotes: 1
Reputation: 1121744
You are replacing your (value, index)
tuples with just the value:
self.values[index] = self.generators[index].next()
You need to replace that with a new tuple:
self.values[index] = (self.generators[index].next(), index)
otherwise the iterable assignment fails; you cannot assign one int
to two variables.
Your generator is missing a loop and handling of empty generators:
def generate(self):
while any(self.values):
r, index = min(v for v in self.values if v)
try:
self.values[index] = (self.generators[index].next(), index)
except StopIteration:
self.values[index] = None
yield r
This sets elements of your self.values
list to None
to indicate the iterable has been exhausted. This is not the most efficient way to handle this edge case; in a version I wrote before I used a dictionary to track active iterables and simply deleted from that to keep indices (keys) stable.
Note that you can replace your t()
function with the built-in iter()
function.
Demo:
>>> class GeneratorSort():
... def __init__(self, *args):
... self.values = [(arg.next(), i) for i, arg in enumerate(args)]
... self.generators = args
... def generate(self):
... while any(self.values):
... r, index = min(v for v in self.values if v)
... try:
... self.values[index] = (self.generators[index].next(), index)
... except StopIteration:
... self.values[index] = None
... yield r
...
>>> l1 = [2, 5, 6, 8]
>>> l2 = [1, 4, 5, 7]
>>> l3 = [0, 3, 9, 10]
>>> a = GeneratorSort(iter(l1), iter(l2), iter(l3))
>>> list(a.generate())
[0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]
The standard library does it more efficiently still with the heapq.merge()
function; it uses a heap to keep the iterables sorted by lowest value in a very efficient manner; min()
needs to loop through all K iterables, while using a heap only takes log-K steps to keep the heap invariant intact.
>>> import heapq
>>> list(heapq.merge(l1, l2, l3))
[0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]
You can study the source code, which has been highly tuned for maximum performance.
Upvotes: 5