Naman
Naman

Reputation: 2669

How can I delete element in log(n) from priority queue implemented through heapq in python?

I have implemented priority queue from heapq data structure in python. Now I want to delete a particular element (by value) from the heap maintaining the heap invariant. I know it can be done by removing the element and heapify() again but that is O(n), and might be very slow since I have a very large heap.

The other thing that I am trying is, if I had known index I could have replaced it with last element and done _shiftup(). But since I don't know the index, I'll have to search, which again is linear time.

Can I keep a parallel dict to point to location and use it? How can I update such dict with every insert to queue?

EDIT:

Actually I need above to implement decreaseKey() in O(log n) time. If there's a better method to directly do that, that's also equivalently good.

Upvotes: 1

Views: 2824

Answers (1)

dano
dano

Reputation: 94951

You may have read this already, but you could use the approach the the heapq docs suggest, which is to just mark the element as removed, without actually removing it from the heap:

The remaining challenges revolve around finding a pending task and making changes to its priority or removing it entirely. Finding a task can be done with a dictionary pointing to an entry in the queue.

Removing the entry or changing its priority is more difficult because it would break the heap structure invariants. So, a possible solution is to mark the existing entry as removed and add a new entry with the revised priority:

pq = []                         # list of entries arranged in a heap
entry_finder = {}               # mapping of tasks to entries
REMOVED = '<removed-task>'      # placeholder for a removed task
counter = itertools.count()     # unique sequence count

def add_task(task, priority=0):
    'Add a new task or update the priority of an existing task'
    if task in entry_finder:
        remove_task(task)
    count = next(counter)
    entry = [priority, count, task]
    entry_finder[task] = entry
    heappush(pq, entry)

def remove_task(task):
    'Mark an existing task as REMOVED.  Raise KeyError if not found.'
    entry = entry_finder.pop(task)
    entry[-1] = REMOVED

def pop_task():
    'Remove and return the lowest priority task. Raise KeyError if empty.'
    while pq:
        priority, count, task = heappop(pq)
        if task is not REMOVED:
            del entry_finder[task]
            return task
    raise KeyError('pop from an empty priority queue')

This way the removal is just a O(1) lookup in a dict. And the removed item will just be ignored when it's popped from the queue later (at a cost of an extra O(log n) on the pop_task operation). The drawback of course, is that if a client is not actually popping items from the queue, the size of the heap will grow, even though it items are being "removed" according to the API.

Upvotes: 3

Related Questions