Reputation: 738
I noticed that std::priority_queue
stores the elements in sorted manner. Obviously storing elements in sorted manner would be a bad design choice as time complexity of push
and pop
would shoot up to O(n)
. But it turns out std::priority_queue
magically sort elements in linear time.
Here is the code that I used for testing.
#include <iostream>
#include <queue>
#include <algorithm>
#include <vector>
#include <chrono>
#include <random>
#include <climits>
#include <fstream>
#include <ios>
int main() {
int size = 10'000'000;
std::random_device rd;
std::mt19937 mt{rd()};
std::uniform_int_distribution<int> uid{1, INT32_MAX};
std::vector<int> vs;
for (int i = 0; i < size; ++i) {
vs.push_back(uid(mt));
}
// Measures time taken by make_heap
std::vector<int> vs1{vs};
auto start = std::chrono::system_clock::now();
std::make_heap(vs1.begin(), vs1.end());
auto end = std::chrono::system_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Time taken by make_heap: " << diff.count() << std::endl;
// Measures time taken by priority_queue
std::vector<int> vs2{vs};
start = std::chrono::system_clock::now();
std::priority_queue<int, std::vector<int>, std::greater<int>> qs{vs2.begin(), vs2.end()};
end = std::chrono::system_clock::now();
diff = end - start;
std::cout << "Time taken by priority_queue: " << diff.count() << std::endl;
// Measures time taken by sort
std::vector<int> vs3{vs};
start = std::chrono::system_clock::now();
std::sort(vs3.begin(), vs3.end());
end = std::chrono::system_clock::now();
diff = end - start;
std::cout << "Time taken by sort: " << diff.count() << std::endl;
std::ofstream ofile;
ofile.open("priority_queue_op.txt", std::ios::out);
for (int i = 0; i < size; ++i) {
ofile << qs.top() << std::endl;
qs.pop();
}
ofile.close();
ofile.open("sort_op.txt", std::ios::out);
for (auto& v : vs3)
ofile << v << std::endl;
ofile.close();
// run `diff priority_queue_op.txt sort_op.txt`
return 0;
}
$ g++ -O3 test.cpp -o test
$ ./test
Time taken by make_heap: 0.133292
Time taken by priority_queue: 0.151002
Time taken by sort: 0.910701
$ diff priority_queue_op.txt sort_op.txt
$
From the above output it is seems like std::priority_queue
is sorting the elements in linear time.
This site suggests that std::priority_queue
uses heap functions from standard library to manage heap internally. Even the source code confirms it.
template<typename _InputIterator>
priority_queue(_InputIterator __first, _InputIterator __last,
const _Compare& __x = _Compare(),
_Sequence&& __s = _Sequence())
: c(std::move(__s)), comp(__x)
{
__glibcxx_requires_valid_range(__first, __last);
c.insert(c.end(), __first, __last);
std::make_heap(c.begin(), c.end(), comp);
}
An insert procedure is used to insert elements followed by std::make_heap
to build the heap. So how are the elements magically sorted? And even if there is something how is it happening in linear time?
Upvotes: 2
Views: 246
Reputation: 64011
So how are the elements magically sorted?
They are not "sorted".
They are arranged in such a way that the element that should be at the "top" is at the correct location. Other elements are not guaranteed to be sorted. Instead, they are arranged in what is known as a heap.
Put another way, a std::priority_queue
is a container optimized to provide fast access to the object that logically belongs in the "top", with the assumption that other elements won't be accessed until they belong at the top.
That "other elements won't be accessed" condition is what allows an organization that is faster than complete sorting.
And even if there is something how is it happening in linear time?
If you're referring to insertions/deletions, it actually does it in logarithmic time.
In logarithmic time, it is guaranteed that the front item is at the correct position, while all other items are guaranteed to form a valid heap. This doesn't take long if the data was already a heap prior to insertion/deletion.
Upvotes: 5
Reputation: 275976
I noticed that
std::priority_queue
stores the elements in sorted manner.
This isn't true.
std::priority_queue
can provide the elements in order, but they are not sorted. The operation of poping does work more than just "take the top element".
Insertion and extraction requires logarithmic time.
Setup of a batch of elements takes linear time.
The trick is that the priority queue uses a data structure known as, or complexity equivalent to, "a heap". This is not to be confused with the colloquial name C/C++ programmers give the "free store" of memory that malloc
and new
allocate from.
A heap is not sorted in the way you might think. It is arranged so that, as you grab elements from it, you know where the largest (or smallest) element is, and you can get the next largest element in logarithmic time.
A really, really naive way to think of this is to imagine a binary tree, where each of the children are guaranteed to be less than the parent.
The largest element is the root. When you eliminate the root, you know that the next largest element is either the left or right child of the root; check, and promote it.
This isn't exactly how a heap works, but it gives you an intuitive way to understand why it is plausible it works.
The magic of the "heap" is being able to do this in a random access buffer without moving too much stuff around in logarithmic time, and take an unsorted set of items in that same random access buffer and arrange it into a "heap" inplace in linear time.
The wikipedia article for Heap isn't that bad. If you want to know more, that or a good undergrad data structure's textbook is a good thing to look at.
Upvotes: 4