Reputation: 5520
I am trying to build a heap with a custom sort predicate. Since the values going into it are of "user-defined" type, I cannot modify their built-in comparison predicate.
Is there a way to do something like:
h = heapq.heapify([...], key=my_lt_pred)
h = heapq.heappush(h, key=my_lt_pred)
Or even better, I could wrap the heapq
functions in my own container so I don't need to keep passing the predicate.
Upvotes: 159
Views: 152008
Reputation: 29
Responding to Stefan Musarra, his approach works well.
This seems like a good approach for making a max heap in general:
from heapq import heapify, heappop as pop, heappush as push
nums = [5, -6, 20, -3, 5, 0, 12, 5]
# Reverse comparison to simulate a max heap
class Num(int):
def __lt__(self, other_num):
return self > other_num
# Heapify
max_heap = [Num(num) for num in nums]
heapify(max_heap)
# Pushing and popping
for n in [-5, 6, 3, 10, -4, 3, -6, 7, 8]:
push(max_heap, Num(n))
for _ in range(len(max_heap)):
print(pop(max_heap))
Output:
20
12
10
8
7
6
5
5
5
3
3
0
-3
-4
-5
-6
-6
It works for strings as well:
names = ['sarah', 'david', 'zack', 'xavier', 'carlos', 'alice', 'ethan', 'fred']
class Str(str):
def __lt__(self, other):
return self > other
max_heap = [Str(name) for name in names]
heapify(max_heap)
for name in ['bob', 'rick', 'oscar', 'yasmin', 'george', 'peter']:
push(max_heap, Str(name))
for _ in range(len(max_heap)):
p = pop(max_heap)
# p is of type 'Str', it can be converted back with str(p)
print(p)
Output:
zack
yasmin
xavier
sarah
rick
peter
oscar
george
fred
ethan
david
carlos
bob
alice
I noticed that the elements come out in their wrapper class still, but can be converted back with int() or str()
Upvotes: 0
Reputation: 1501
Using the answer from Fanchen Bao above, I created a Max Priority Queue by extending tuple:
import heapq
class MaxTuple(tuple):
def __lt__(self, other):
return self[0] > other[0]
my_tuples = [(2, "orange"), (1, "red"), (5, "blue"), (3, "yellow"), (4, "green")]
my_queue = [MaxTuple(t) for t in my_tuples]
heapq.heapify(my_queue)
while my_queue:
print(heapq.heappop(my_queue))
Which pops the heap from max to min:
(5, 'blue')
(4, 'green')
(3, 'yellow')
(2, 'orange')
(1, 'red')
Upvotes: 1
Reputation: 31
Simple little trick:
Say you have this list of (name,age) as
a = [('Tim',4), ('Radha',9), ('Rob',7), ('Krsna',3)]
And you want to sort this list based on their ageby adding them to a min-heap, instead of writing all the custom comparator stuff, you can just flip the order of the contents of the tuple just before pushing it to the queue. This is because heapq.heappush() sorts by the first element of the tuple by default. Like this:
import heapq
heap = []
heapq.heapify(heap)
for element in a:
heapq.heappush(heap, (element[1],element[0]))
This is a simple trick if this does your job and you don't want to get into writing the custom comparator mess.
Similarly it sorts the values in ascending order by default. If you want to sort in descending order of age, flip the contents and make the value of the first element of the tuple a negative:
import heapq
heap = []
heapq.heapify(heap)
for element in a:
heapq.heappush(heap, (-element[1],element[0]))
Upvotes: 3
Reputation: 947
A simple solution is to store entries as a list of tuples
for each tuple define the priority in your desired order if you need a different order for each item within the tuple just make it the negative for descending order.
See the official heapq python documentation in this topic Priority Queue Implementation Notes
Upvotes: 0
Reputation: 110271
According to the heapq documentation, the way to customize the heap order is to have each element on the heap to be a tuple, with the first tuple element being one that accepts normal Python comparisons.
The functions in the heapq module are a bit cumbersome (since they are not object-oriented), and always require our heap object (a heapified list) to be explicitly passed as the first parameter. We can kill two birds with one stone by creating a very simple wrapper class that will allow us to specify a key
function, and present the heap as an object.
The class below keeps an internal list, where each element is a tuple, the first member of which is a key, calculated at element insertion time using the key
parameter, passed at Heap instantiation:
# -*- coding: utf-8 -*-
import heapq
class MyHeap(object):
def __init__(self, initial=None, key=lambda x:x):
self.key = key
self.index = 0
if initial:
self._data = [(key(item), i, item) for i, item in enumerate(initial)]
self.index = len(self._data)
heapq.heapify(self._data)
else:
self._data = []
def push(self, item):
heapq.heappush(self._data, (self.key(item), self.index, item))
self.index += 1
def pop(self):
return heapq.heappop(self._data)[2]
(The extra self.index
part is to avoid clashes when the evaluated key value is a draw and the stored value is not directly comparable - otherwise heapq could fail with TypeError)
Upvotes: 172
Reputation: 759
In python3, you can use cmp_to_key
from functools
module. cpython source code.
Suppose you need a priority queue of triplets and specify the priority use the last attribute.
from heapq import *
from functools import cmp_to_key
def mycmp(triplet_left, triplet_right):
key_l, key_r = triplet_left[2], triplet_right[2]
if key_l > key_r:
return -1 # larger first
elif key_l == key_r:
return 0 # equal
else:
return 1
WrapperCls = cmp_to_key(mycmp)
pq = []
myobj = tuple(1, 2, "anystring")
# to push an object myobj into pq
heappush(pq, WrapperCls(myobj))
# to get the heap top use the `obj` attribute
inner = pq[0].obj
python 3.10.2
from functools import cmp_to_key
from timeit import default_timer as time
from random import randint
from heapq import *
class WrapperCls1:
__slots__ = 'obj'
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
kl, kr = self.obj[2], other.obj[2]
return True if kl > kr else False
def cmp_class2(obj1, obj2):
kl, kr = obj1[2], obj2[2]
return -1 if kl > kr else 0 if kl == kr else 1
WrapperCls2 = cmp_to_key(cmp_class2)
triplets = [[randint(-1000000, 1000000) for _ in range(3)] for _ in range(100000)]
# tuple_triplets = [tuple(randint(-1000000, 1000000) for _ in range(3)) for _ in range(100000)]
def test_cls1():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls1(triplet))
def test_cls2():
pq = []
for triplet in triplets:
heappush(pq, WrapperCls2(triplet))
def test_cls3():
pq = []
for triplet in triplets:
heappush(pq, (-triplet[2], triplet))
start = time()
for _ in range(10):
test_cls1()
# test_cls2()
# test_cls3()
print("total running time (seconds): ", -start+(start:=time()))
use list
instead of tuple
, per function:
__slots__
: 9.8msTherefore, this method is slightly faster than using a custom class with an overridden __lt__()
function and the __slots__
attribute.
Upvotes: 1
Reputation: 573
setattr(ListNode, "__lt__", lambda self, other: self.val <= other.val)
Use this for comparing values of objects in heapq
Upvotes: 28
Reputation: 4279
Define a class, in which override the __lt__()
function. See example below (works in Python 3.7):
import heapq
class Node(object):
def __init__(self, val: int):
self.val = val
def __repr__(self):
return f'Node value: {self.val}'
def __lt__(self, other):
return self.val < other.val
heap = [Node(2), Node(0), Node(1), Node(4), Node(2)]
heapq.heapify(heap)
print(heap) # output: [Node value: 0, Node value: 2, Node value: 1, Node value: 4, Node value: 2]
heapq.heappop(heap)
print(heap) # output: [Node value: 1, Node value: 2, Node value: 2, Node value: 4]
Upvotes: 180
Reputation: 31
The limitation with both answers is that they don't allow ties to be treated as ties. In the first, ties are broken by comparing items, in the second by comparing input order. It is faster to just let ties be ties, and if there are a lot of them it could make a big difference. Based on the above and on the docs, it is not clear if this can be achieved in heapq. It does seem strange that heapq does not accept a key, while functions derived from it in the same module do.
P.S.:
If you follow the link in the first comment ("possible duplicate...") there is another suggestion of defining le which seems like a solution.
Upvotes: 3
Reputation: 19329
The heapq documentation suggests that heap elements could be tuples in which the first element is the priority and defines the sort order.
More pertinent to your question, however, is that the documentation includes a discussion with sample code of how one could implement their own heapq wrapper functions to deal with the problems of sort stability and elements with equal priority (among other issues).
In a nutshell, their solution is to have each element in the heapq be a triple with the priority, an entry count and the element to be inserted. The entry count ensures that elements with the same priority a sorted in the order they were added to the heapq.
Upvotes: 29