ovgolovin
ovgolovin

Reputation: 13410

Traversing heapified list

I'm making a Monte-Carlo simulation. And as a part of this task I generate samples uniformly distributed over an interval (0,100).

generate = lambda: uniform(0,100)

The iterations stop when all the closest generated points' pairs meet the criteria.

check = lambda a,b: True if (b-a)<5 else False

I need to have some structure to effectively keep all the generated points and be able to go through them in ascending order to perform check on all the subsequent pairs.

There is a heapq module in Python which supports a very effective heap structure. And I decided to use it.

I faced a problem. I have found no traversal procedure supported by this module. The only way I found to access the values of the heap in ascending order is to use heapq.heappop. But it deletes the values from the heap.

I found the workaround for this and just copied the heap object into the new one and iterated with heappop over the new one. But I don't think it's quite effective to copy the whole structure in memory one every iteration.

Is there any other way I can go to do what I'm trying to do more effectively?


The simplified code for illustration.

import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy


def pairwise(iterable): #get values from iterator in pairs
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)


check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)


def iterate_heap(heap):
    heap = copy(heap) #Here I have to copy the heap to be able to traverse
    try:
        while True:
            yield heapq.heappop(heap)
    except IndexError:
        return


def trial():
    items = []

    for i in count():
        item = generate()
        heapq.heappush(items, item)

        it = iterate_heap(items)
        it = pairwise(it)

        if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
            return i

print "The solution is reached. It took %d iterations." % trial()

paiwise function is from recipe from here.


Update: In this implementation with heappop the complexity on each iteration is O(n*log(n)):

Copying heap: O(n)

Adding a new value to the heap: O(log(n))

Traversing: n elements * O(log(n)) on popping each value from heap -> O(n*log(n)).

Result: O(n+log(n)+n*log(n)) = O(n*log(n)

But I expect the traversal to be O(n), so the resultant complexity would be O(n).

By the way, if we use just sorted list, we would need to sort the list on each adding, so O(n*log(n)), but the traversal would be n*O(1) -> O(n). So, the resultant complexity is still O(n*log(n)).

I have found a solution. It's to use bisect module. Finding the place to add would be O(log(n)). Adding to the list is of O(n) (because of the implementation all the values after the insertion in place have to be moved). Traversing is O(n). So, the resultant complexity is O(n).

Still, I wounder, if there is a way to solve this task using heaps in Python.

Upvotes: 5

Views: 8626

Answers (5)

Luke Heytens
Luke Heytens

Reputation: 579

I created an Iterator class that will perform a lazy in-order traversal of a min heap. It has the following advantages:

  1. Doesn't require a copy of the original heap
  2. Doesn't modify the original heap
  3. Lazy iteration is more efficient if stopping early

To keep track the next items for iteration, I actually just used another heap self.next_items.

import heapq

class HeapIter:

    def __init__(self, heap):
        self.original_heap = heap
        self.next_items = []
        if len(self.original_heap) > 0:
            self.next_items.append((self.original_heap[0], 0))

    def current_element(self):
        if len(self.next_items) == 0:
            return None
        return self.next_items[0][0]

    def next(self):
        if len(self.next_items) == 0:
            return None
        next_elem, next_index = heapq.heappop(self.next_items)
        child_1 = 2 * next_index + 1
        child_2 = child_1 + 1
        if child_1 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_1], child_1))
        if child_2 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_2], child_2))
        return next_elem

Upvotes: 0

ovgolovin
ovgolovin

Reputation: 13410

I have made some efficiency calculations.

The best performance is achieved with using bisect module: 10000 insertions in the middle of the list clocked 0.037 sec on my computer (Python 2.7).

With using sortedlist from blist module clocked 0.287 sec for the same amount of insertions.

And using a traditional list with sort applyed after each append clocked 2.796 sec. (Now Timsort algorithm is used in Python and it is argued to be very efficient on nearly sorted list; still it turns out to be not that efficient as using bisect).


The code I used to make these calculations:

import bisect
import timeit
import __main__
import blist

N = 10000 #Number of executions
L = 1000 #Length of initial list

def test_f_bisect(a):
    bisect.insort_right(a,500)


def test_f_list_sort(a):
    a.append(500)
    a.sort()


test_f_blist_init = '''
from __main__ import test_f_blist
import blist
a = blist.sortedlist(range({L}))
'''.format(L=L)
def test_f_blist(a):
    a.add(500)


names = dir(__main__)
for name in names:
    attr = getattr(__main__,name)
    if hasattr(attr,'__call__'):
        if name.startswith('test_f_'):
            init_name = name + '_init'
            if hasattr(__main__, init_name):
                init = getattr(__main__,init_name)
            else:
                init = 'from __main__ import {name}; a = list(range({L}))'.format(name=name, L=L)
            t = timeit.Timer(stmt='{name}(a)'.format(name=name),
                             setup=init)

            time = t.timeit(N)
            print('{name}: {time}'.format(name=name,time=time))

Upvotes: 3

Jochen Ritzel
Jochen Ritzel

Reputation: 107608

For the record, the right data structure in this case is a B-Tree. There is a implementation:

 from blist import sortedlist

The runtime complexity is as low as it gets: O(n*logn) to construct the list, O(n) to iterate.

Upvotes: 1

Raymond Hettinger
Raymond Hettinger

Reputation: 226296

I would use list.sort() on the heap. That leaves the heap condition intact and makes it possible to iterate over the underlying list directly.

FWIW, the Timsort algorithm used by list.sort will take advantage of the partial ordering that already exists in the heap.

Upvotes: 6

ObscureRobot
ObscureRobot

Reputation: 7336

From the python docs:

These two make it possible to view the heap as a regular Python list without surprises: heap[0] is the smallest item, and heap.sort() maintains the heap invariant!

Is there a reason you can't just treat the heap as a list and iterate over it?

Upvotes: 4

Related Questions