Omer Dagan
Omer Dagan

Reputation: 15996

Python: Priority queue does'nt keep order on same priority elements

I'm using python's Queue.PriorityQueue, and ran into the following problem: when inserting several elements to the queue which have the same priority, I would expect the queue to serve them in the order of insertion (FIFO). For some reason this is not the case:

>>> from Queue import PriorityQueue
>>>
>>> j1 = (1, 'job1')
>>> j2 = (1, 'job2')
>>> j3 = (1, 'job3')
>>> j4 = (1, 'job4')
>>> 
>>> q = PriorityQueue()
>>> q.put(j1)
>>> q.put(j2)
>>> q.put(j3)
>>> q.put(j4)
>>> q.queue
[(1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4')]
>>> q.get()
(1, 'job1')
>>> q.queue
[(1, 'job2'), (1, 'job4'), (1, 'job3')]

As can be seen from the example, the order has been mixed after one get(). What's the reason? how to overcome (keep the order of same prio elements)?

EDIT:

I was asked to add an example that shows that q.get() actually mess things up with the FIFO ordering, so here's an elaborate example:

class Job(object):
    def __init__(self, type_, **data):
        self.type_ = type_
        self.priority = 0 if self.type_ == 'QUIT' else 1
        self.data = data

    def __cmp__(self, other):
        return cmp(self.priority, other.priority)

    def __repr__(self):
        return 'Job("' + self.type_ + '", data=' + repr(self.data) + ')' 

q = PriorityQueue()
q.put(Job('Build'))
q.put(Job('Clean'))
q.put(Job('QUIT'))
q.put(Job('Create'))
q.put(Job('Build'))
q.put(Job('Clean'))

Now I'll dequeue the elements one by one. The expected result: QUIT goes out first, and then the rest, FIFO ordered: Build, Clean, Create, Build, Clean:

>>> q.get()
Job("QUIT", data={})
>>> q.get()
Job("Build", data={})
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Build", data={}) # <<---
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Create", data={})

Upvotes: 9

Views: 14062

Answers (4)

healthybodhi
healthybodhi

Reputation: 456

Here's actual implementable code to make PriorityQueue FIFO. I adapted from momo's original answer to a different question here:

from dataclasses import dataclass, field
from typing import Any, ClassVar

@dataclass(order=True)
class FifoPriorityQueueItem:
    data: Any=field(default=None, compare=False)
    priority: int=10
    sequence: int=field(default_factory=lambda: {0})
    counter: ClassVar[int] = 0

    def get_data(self):
        return self.data

    def __post_init__(self):
        self.sequence = FifoPriorityQueueItem.next_seq()

    @staticmethod
    def next_seq():
        FifoPriorityQueueItem.counter += 1
        return FifoPriorityQueueItem.counter

def main():
    import asyncio
    print('with FifoPriorityQueueItem is FIFO')
    q = asyncio.PriorityQueue()
    q.put_nowait(FifoPriorityQueueItem('z'))
    q.put_nowait(FifoPriorityQueueItem('y'))
    q.put_nowait(FifoPriorityQueueItem('b', priority=1))
    q.put_nowait(FifoPriorityQueueItem('x'))
    q.put_nowait(FifoPriorityQueueItem('a', priority=1))
    while not q.empty():
        print(q.get_nowait().get_data())

    print('without FifoPriorityQueueItem is no longer FIFO')
    q.put_nowait((10, 'z'))
    q.put_nowait((10, 'y'))
    q.put_nowait((1, 'b'))
    q.put_nowait((10, 'x'))
    q.put_nowait((1, 'a'))
    while not q.empty():
        print(q.get_nowait()[1])

if __name__ == '__main__':
    main()

Upvotes: 0

Stefan Pochmann
Stefan Pochmann

Reputation: 28656

Priority queues "are often implemented with heaps" and Python is no exception. As the documentation says, it's "using the heapq module". And heaps don't naturally offer stability. That's also why heapsort "is not a stable sort". If you want stability, you'll need to enforce it yourself. Fortunately it's as simple as storing entries "as 3-element list including the priority, an entry count, and the task".

Note that you give Python's priority queue pairs of priority and task, but the queue doesn't care. It doesn't think of the two values as priority and task. It just thinks of the pair as one "item" and it never even looks into it. Only we the users think of the pair as priority and task. So you could also give it task strings alone, without extra priorities. The queue wouldn't even notice. It doesn't try to extract some priority. For its prioritization it just asks the whole item whether it's smaller than another. That's why, when you want to prioritize tasks not just by their natural order (e.g., the string 'job1' being smaller than the string 'job2'), you use a tuple of priority and task. Tuples are ordered lexicographically, so (a, b) is smaller than (c, d) if a is smaller than c or if they're equal and b is smaller than d. So when the queue asks such a tuple whether it's smaller than another, it's the tuple that looks into itself and considers the priority first and then potentially the task second.

Also, with q.queue you're inspecting the queue's underlying data structure. You shouldn't care about that. Not sure why it's even accessible. But if you do inspect it, you need to look at it as the heap it is, not think of it as a sorted list. It's not that "the order has been mixed" as you put it, it's that you misinterpreted that list. Anyway... the order you should instead care about is the order you actually get. With q.get(). If you just get all four items of that example with q.get(), you'll see that it does give them to you in your insertion order. Although that's because you're inserting them in sorted order and they only have one possible order, as there are no equal items. You'll get (1, 'job1') first not because it was inserted first but because it's the smallest of the four tuples (because the priorities are the same and 'job1' is the smallest of the four strings). And you'll get (1, 'job2') second not because it was inserted second but because it's the second-smallest item. And so on. If you inserted them in any other order, you'd still get them in order (1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4').

About your added example: Your Job objects only compare themselves by their priority. And those Build, Clean, Create, Build and Clean objects all have the same priority. So as far as the queue can tell, they're all equal! That's not like your first example, where your four tuples only allow one possible order. So we're back at what I said at the start, heaps don't naturally offer stability and if you want stability, you should add an entry count. Check out the explanation and recipe I linked there. It uses a list as heap and uses heapq functions, but you can easily adapt it to use a PriorityQueue instead. Though instead of those separate top-level helper functions, maybe better define your own StablePriorityQueue class, as subclass or wrapper of PriorityQueue.

Upvotes: 8

IMCoins
IMCoins

Reputation: 3306

The other 2 answers explained what happens.

Although I want to offer you another representation that will help you have a better understanding.

I took a snapshot from this documentation page about heapq. First of all, you can see that PriorityQueue uses a heappop here

Now, to the image.

heappop

In this image, when you pop the first item 0 (job1), 1 ('job2') will take its place, and then, 3 (job4) will take 1 (job2) place. We should conclude by saying this is a normal behaviour.

Upvotes: 2

zmbq
zmbq

Reputation: 39059

As explained here, the Python PriorityQueue is implemented with a binary heap.

A binary heap is a binary tree where each node's value is equal or greater the values of both its children. Hence in a binary heap the root always contains the minimum value. Once you remove the minimum node, the heap is reorganized so that the basic heap property is still in effect.

A heap is usually implemented using an array, where a[k] is the parent of a[2*k] and a[2*k+1]. In Python, q.queue is this array. After you remove an element from the heap, the array is reordered in a way that doesn't preserve the original order.

Upvotes: 3

Related Questions