Reputation: 13410
I'm making a Monte-Carlo simulation. And as a part of this task I generate samples uniformly distributed over an interval (0,100)
.
generate = lambda: uniform(0,100)
The iterations stop when all the closest generated points' pairs meet the criteria.
check = lambda a,b: True if (b-a)<5 else False
I need to have some structure to effectively keep all the generated points and be able to go through them in ascending order to perform check
on all the subsequent pairs.
There is a heapq
module in Python which supports a very effective heap structure. And I decided to use it.
I faced a problem. I have found no traversal procedure supported by this module. The only way I found to access the values of the heap in ascending order is to use heapq.heappop
. But it deletes the values from the heap.
I found the workaround for this and just copied the heap object into the new one and iterated with heappop
over the new one. But I don't think it's quite effective to copy the whole structure in memory one every iteration.
Is there any other way I can go to do what I'm trying to do more effectively?
The simplified code for illustration.
import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy
def pairwise(iterable): #get values from iterator in pairs
a, b = tee(iterable)
next(b, None)
return izip(a, b)
check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)
def iterate_heap(heap):
heap = copy(heap) #Here I have to copy the heap to be able to traverse
try:
while True:
yield heapq.heappop(heap)
except IndexError:
return
def trial():
items = []
for i in count():
item = generate()
heapq.heappush(items, item)
it = iterate_heap(items)
it = pairwise(it)
if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
return i
print "The solution is reached. It took %d iterations." % trial()
paiwise
function is from recipe from here.
Update:
In this implementation with heappop
the complexity on each iteration is O(n*log(n))
:
Copying heap: O(n)
Adding a new value to the heap: O(log(n))
Traversing: n
elements * O(log(n))
on popping each value from heap -> O(n*log(n))
.
Result: O(n+log(n)+n*log(n)) = O(n*log(n)
But I expect the traversal to be O(n)
, so the resultant complexity would be O(n)
.
By the way, if we use just sorted list, we would need to sort the list on each adding, so O(n*log(n))
, but the traversal would be n*O(1) -> O(n)
. So, the resultant complexity is still O(n*log(n))
.
I have found a solution. It's to use bisect
module. Finding the place to add would be O(log(n))
. Adding to the list is of O(n)
(because of the implementation all the values after the insertion in place have to be moved). Traversing is O(n)
. So, the resultant complexity is O(n)
.
Still, I wounder, if there is a way to solve this task using heaps in Python.
Upvotes: 5
Views: 8626
Reputation: 579
I created an Iterator class that will perform a lazy in-order traversal of a min heap. It has the following advantages:
To keep track the next items for iteration, I actually just used another heap self.next_items
.
import heapq
class HeapIter:
def __init__(self, heap):
self.original_heap = heap
self.next_items = []
if len(self.original_heap) > 0:
self.next_items.append((self.original_heap[0], 0))
def current_element(self):
if len(self.next_items) == 0:
return None
return self.next_items[0][0]
def next(self):
if len(self.next_items) == 0:
return None
next_elem, next_index = heapq.heappop(self.next_items)
child_1 = 2 * next_index + 1
child_2 = child_1 + 1
if child_1 < len(self.original_heap):
heapq.heappush(self.next_items, (self.original_heap[child_1], child_1))
if child_2 < len(self.original_heap):
heapq.heappush(self.next_items, (self.original_heap[child_2], child_2))
return next_elem
Upvotes: 0
Reputation: 13410
I have made some efficiency calculations.
The best performance is achieved with using bisect
module:
10000 insertions in the middle of the list clocked 0.037 sec on my computer (Python 2.7).
With using sortedlist
from blist
module clocked 0.287 sec for the same amount of insertions.
And using a traditional list
with sort
applyed after each append
clocked 2.796 sec. (Now Timsort
algorithm is used in Python and it is argued to be very efficient on nearly sorted list; still it turns out to be not that efficient as using bisect
).
The code I used to make these calculations:
import bisect
import timeit
import __main__
import blist
N = 10000 #Number of executions
L = 1000 #Length of initial list
def test_f_bisect(a):
bisect.insort_right(a,500)
def test_f_list_sort(a):
a.append(500)
a.sort()
test_f_blist_init = '''
from __main__ import test_f_blist
import blist
a = blist.sortedlist(range({L}))
'''.format(L=L)
def test_f_blist(a):
a.add(500)
names = dir(__main__)
for name in names:
attr = getattr(__main__,name)
if hasattr(attr,'__call__'):
if name.startswith('test_f_'):
init_name = name + '_init'
if hasattr(__main__, init_name):
init = getattr(__main__,init_name)
else:
init = 'from __main__ import {name}; a = list(range({L}))'.format(name=name, L=L)
t = timeit.Timer(stmt='{name}(a)'.format(name=name),
setup=init)
time = t.timeit(N)
print('{name}: {time}'.format(name=name,time=time))
Upvotes: 3
Reputation: 107608
For the record, the right data structure in this case is a B-Tree. There is a implementation:
from blist import sortedlist
The runtime complexity is as low as it gets: O(n*logn) to construct the list, O(n) to iterate.
Upvotes: 1
Reputation: 226296
I would use list.sort() on the heap. That leaves the heap condition intact and makes it possible to iterate over the underlying list directly.
FWIW, the Timsort algorithm used by list.sort will take advantage of the partial ordering that already exists in the heap.
Upvotes: 6
Reputation: 7336
From the python docs:
These two make it possible to view the heap as a regular Python list without surprises: heap[0] is the smallest item, and heap.sort() maintains the heap invariant!
Is there a reason you can't just treat the heap as a list and iterate over it?
Upvotes: 4