Reputation: 15996
I'm using python's Queue.PriorityQueue
, and ran into the following problem: when inserting several elements to the queue which have the same priority, I would expect the queue to serve them in the order of insertion (FIFO). For some reason this is not the case:
>>> from Queue import PriorityQueue
>>>
>>> j1 = (1, 'job1')
>>> j2 = (1, 'job2')
>>> j3 = (1, 'job3')
>>> j4 = (1, 'job4')
>>>
>>> q = PriorityQueue()
>>> q.put(j1)
>>> q.put(j2)
>>> q.put(j3)
>>> q.put(j4)
>>> q.queue
[(1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4')]
>>> q.get()
(1, 'job1')
>>> q.queue
[(1, 'job2'), (1, 'job4'), (1, 'job3')]
As can be seen from the example, the order has been mixed after one get()
.
What's the reason? how to overcome (keep the order of same prio elements)?
EDIT:
I was asked to add an example that shows that q.get()
actually mess things up with the FIFO ordering, so here's an elaborate example:
class Job(object):
def __init__(self, type_, **data):
self.type_ = type_
self.priority = 0 if self.type_ == 'QUIT' else 1
self.data = data
def __cmp__(self, other):
return cmp(self.priority, other.priority)
def __repr__(self):
return 'Job("' + self.type_ + '", data=' + repr(self.data) + ')'
q = PriorityQueue()
q.put(Job('Build'))
q.put(Job('Clean'))
q.put(Job('QUIT'))
q.put(Job('Create'))
q.put(Job('Build'))
q.put(Job('Clean'))
Now I'll dequeue the elements one by one. The expected result: QUIT goes out first, and then the rest, FIFO ordered: Build, Clean, Create, Build, Clean:
>>> q.get()
Job("QUIT", data={})
>>> q.get()
Job("Build", data={})
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Build", data={}) # <<---
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Create", data={})
Upvotes: 9
Views: 14062
Reputation: 456
Here's actual implementable code to make PriorityQueue FIFO. I adapted from momo's original answer to a different question here:
from dataclasses import dataclass, field
from typing import Any, ClassVar
@dataclass(order=True)
class FifoPriorityQueueItem:
data: Any=field(default=None, compare=False)
priority: int=10
sequence: int=field(default_factory=lambda: {0})
counter: ClassVar[int] = 0
def get_data(self):
return self.data
def __post_init__(self):
self.sequence = FifoPriorityQueueItem.next_seq()
@staticmethod
def next_seq():
FifoPriorityQueueItem.counter += 1
return FifoPriorityQueueItem.counter
def main():
import asyncio
print('with FifoPriorityQueueItem is FIFO')
q = asyncio.PriorityQueue()
q.put_nowait(FifoPriorityQueueItem('z'))
q.put_nowait(FifoPriorityQueueItem('y'))
q.put_nowait(FifoPriorityQueueItem('b', priority=1))
q.put_nowait(FifoPriorityQueueItem('x'))
q.put_nowait(FifoPriorityQueueItem('a', priority=1))
while not q.empty():
print(q.get_nowait().get_data())
print('without FifoPriorityQueueItem is no longer FIFO')
q.put_nowait((10, 'z'))
q.put_nowait((10, 'y'))
q.put_nowait((1, 'b'))
q.put_nowait((10, 'x'))
q.put_nowait((1, 'a'))
while not q.empty():
print(q.get_nowait()[1])
if __name__ == '__main__':
main()
Upvotes: 0
Reputation: 28656
Priority queues "are often implemented with heaps" and Python is no exception. As the documentation says, it's "using the heapq
module". And heaps don't naturally offer stability. That's also why heapsort "is not a stable sort". If you want stability, you'll need to enforce it yourself. Fortunately it's as simple as storing entries "as 3-element list including the priority, an entry count, and the task".
Note that you give Python's priority queue pairs of priority and task, but the queue doesn't care. It doesn't think of the two values as priority and task. It just thinks of the pair as one "item" and it never even looks into it. Only we the users think of the pair as priority and task. So you could also give it task strings alone, without extra priorities. The queue wouldn't even notice. It doesn't try to extract some priority. For its prioritization it just asks the whole item whether it's smaller than another. That's why, when you want to prioritize tasks not just by their natural order (e.g., the string 'job1'
being smaller than the string 'job2'
), you use a tuple of priority and task. Tuples are ordered lexicographically, so (a, b)
is smaller than (c, d)
if a
is smaller than c
or if they're equal and b
is smaller than d
. So when the queue asks such a tuple whether it's smaller than another, it's the tuple that looks into itself and considers the priority first and then potentially the task second.
Also, with q.queue
you're inspecting the queue's underlying data structure. You shouldn't care about that. Not sure why it's even accessible. But if you do inspect it, you need to look at it as the heap it is, not think of it as a sorted list. It's not that "the order has been mixed" as you put it, it's that you misinterpreted that list. Anyway... the order you should instead care about is the order you actually get. With q.get()
. If you just get all four items of that example with q.get()
, you'll see that it does give them to you in your insertion order. Although that's because you're inserting them in sorted order and they only have one possible order, as there are no equal items. You'll get (1, 'job1')
first not because it was inserted first but because it's the smallest of the four tuples (because the priorities are the same and 'job1'
is the smallest of the four strings). And you'll get (1, 'job2')
second not because it was inserted second but because it's the second-smallest item. And so on. If you inserted them in any other order, you'd still get them in order (1, 'job1')
, (1, 'job2')
, (1, 'job3')
, (1, 'job4')
.
About your added example: Your Job
objects only compare themselves by their priority. And those Build, Clean, Create, Build and Clean objects all have the same priority. So as far as the queue can tell, they're all equal! That's not like your first example, where your four tuples only allow one possible order. So we're back at what I said at the start, heaps don't naturally offer stability and if you want stability, you should add an entry count. Check out the explanation and recipe I linked there. It uses a list as heap and uses heapq
functions, but you can easily adapt it to use a PriorityQueue
instead. Though instead of those separate top-level helper functions, maybe better define your own StablePriorityQueue
class, as subclass or wrapper of PriorityQueue
.
Upvotes: 8
Reputation: 3306
The other 2 answers explained what happens.
Although I want to offer you another representation that will help you have a better understanding.
I took a snapshot from this documentation page about heapq
. First of all, you can see that PriorityQueue
uses a heappop here
Now, to the image.
In this image, when you pop the first item 0
(job1
), 1
('job2') will take its place, and then, 3
(job4
) will take 1
(job2
) place. We should conclude by saying this is a normal behaviour.
Upvotes: 2
Reputation: 39059
As explained here, the Python PriorityQueue is implemented with a binary heap.
A binary heap is a binary tree where each node's value is equal or greater the values of both its children. Hence in a binary heap the root always contains the minimum value. Once you remove the minimum node, the heap is reorganized so that the basic heap property is still in effect.
A heap is usually implemented using an array, where a[k]
is the parent of a[2*k]
and a[2*k+1]
. In Python, q.queue
is this array. After you remove an element from the heap, the array is reordered in a way that doesn't preserve the original order.
Upvotes: 3