Reputation: 77454
Binary heaps are commonly used in e.g. priority queues. The basic idea is that of an incomplete heap sort: you keep the data sorted "just enough" to get out the top element quickly.
While 4-ary heaps are theoretically worse than binary heaps, they do also have some benefits. For example, they will require less heap restructuring operations (as the heap is much shallower), while obvisouly needing more comparisons at each level. But (and that probably is their main benefit?) they may have better CPU cache locality. So some sources say that 3-ary and 4-ary heaps outperform both Fibonacci and binary heaps in practise.
They should not be much harder to implement, the additional cases are just some extra if
cases.
Has anyone experimented with 4-ary heaps (and 3-ary) for priority queues and done some benchmarking? In Java you never know if they are faster or slower before you benchmarked them extensively. And from all I've found via Google, it may be quite language and use case dependant. Some sources say that they found 3-ary to perform best for them.
Some more points:
PriorityQueue
obviously is a binary heap. But the class for example also lacks bulk loading and bulk repair support, or replaceTopElement
which can make a huge difference. Bulk loading for example is O(n)
instead of O(n log n)
; bulk repair is essentially the same after adding a larger set of candidates. Tracking which parts of the heap are invalid can be done with a single integer. replaceTopElement
is much cheaper than poll
+ add
(just consider how a poll is implemented: replace the top element with the very last)Upvotes: 6
Views: 2701
Reputation: 8705
I have not benchmarked 4-ary heaps yet. I'm currently trying to optimize our own heap implementations, and I'm trying 4-ary heaps there, too. And you are right: we will need to benchmark this carefully, as it is easy to get mislead by implementation differences and Hotspot optimization will heavily affect the results. Plus, small heaps will probably show different performance characteristics than large heaps.
The Java PriorityQueue
is a very simple heap implementation, but that means Hotspot will optimize it well. It's not bad at all: most people would implement a worse heap. But, for example, it indeed does not do efficient bulk loads or bulk adds (bulk repairs). However, in my experiments it was hard to consistently beat this implementation even in simulations with repeated inserts, unless you go for really large heaps. Furthermore, in many situations it pays off to replace the top element in the heap instead of poll()
+ add()
; this is not supported by java's PriorityQueue
.
Some of the performance gains in ELKI (and I've seen you are an ELKI user) across versions are actually due to improved heap implementations. But it's an up and down, it's hard to predict which heap variation performs best across real workloads. The key benefit of our implementation is probably to have a "replaceTopElement" function. You can inspect the code here:
SVN de.lmu.ifi.dbs.elki.utilities.heap package
You will notice we have a whole set of heaps there. They are optimized for different stuff, and will need some more refactoring. A number of these classes are actually generated from templates, similar to what GNU Trove does. The reason is that Java can be quite costly when managing boxed primitives, so it does pay off to have primitive versions. (yes, there are plans to split this out into a separate library. It's just not of high priority.)
Note that ELKI deliberately does not endorse the java.util.Collections
API. We have found in particular the java.util.Iterator
class to be quite costly, and thus try to encourage people to use C++-style iterators throughout ELKI:
for (Iter iter = ids.iter(); iter.valid(); iter.advance()) {
often saves a lot of unnecessary object creations over the java.util.Iterator
API. Plus, these iterators can have multiple (and primitive) value getters; where Iterator.next()
is a mixture of a getter and the advance operator.
Ok, I have drifted off too much now, back to the topic of 4-ary heaps:
If you intend to try out 4-ary heaps, I suggest you start with the ObjectHeap
class there.
Update: I've been microbenchmarking, but the results so far are inconclusive. It's hard to beat PriorityQueue
consistently. In particular bulk loading and bulk repairs do not seem to cut anything in my benchmark - probably they cause HotSpot to optimize less, or de-optimize at some point. As often, a simpler Java code is faster than a complex logic. So far, 4-ary heaps without bulk-loading seem to work best. I haven't tried 5-ary yet. 3-ary are about equal with 4-ary heaps; and the memory layout of 4-ary is a bit nicer. I'm also considering to try a heap-of-heaps approach to safe array resizing. But I expect that the increased code complexity means it will run slower in practise.
Upvotes: 1
Reputation: 77454
As per @ErichSchubert's suggestion, I have taken the implementations from ELKI and modified them into a 4-ary heap. It was a bit trick to get the indexing right, as a lot of the publications around 4-ary heaps use formulas for 1-indexed arrays?!?
Here are some early benchmark results, based on the ELKI unit test. 200000 Double
objects are preallocated (to avoid measuring memory management too much) and shuffled.
As a warmup, 10 iterations are performed for each heap, for benchmarking 100 iterations, but I'll probably try to scale this up further. 10-30 seconds isn't that realiably for benchmarking yet, and OTOH I should try to measure standard deviations, too. In each iteration, the 200000 elements are added to the heap, then half of them are polled again. Yes, the workload could also be made more complex.
Here are the results:
DoubleMinHeap
: 10.371DoubleMinHeap
: 12.356Heap<Double>
: 37.458PriorityQueue<Double>
: 45.875So the difference between the 4-ary heap (probably not yet L1 cache-aligned!) and the ELKI heap for primitive doubles is not too big. Well, 10%-20% or so; it could be worse.
The difference between a heap for primitive double
s and a heap for Double
objects is much larger. And the ELKI Heap
is indeed quite clearly faster than the Java PriorityQueue
(but that one seems to have a high variance).
There was a slight "bug" in ELKI, though - at least the primitive heaps did not use the bulk loading code yet. It's there, it's just not being used, as every elements repairs the heap immediately instead of delaying this until the next poll()
. I fixed this for my experiments, essentially by removing a few lines and adding one ensureValid();
call. Furthermore, I also don't have a 4-ary object heap yet, and I havn't include ELKI's DoubleObjectMinHeap
yet... quite a lot to benchmark, and I'll probably give caliper a try for that.
Upvotes: 2
Reputation: 106351
Not benchmarked it myself, but have a few points to make that are relevant.
Firstly, note that the standard Java implementation of PriorityQueue uses a binary heap:
It is plausibly the case that, despite the cache locality benefit of n-ary heaps, binary heaps are still the best solution on average. Below are some slightly hand-wavy reasons why this might be the case:
Obviously of course you'd need to do your own benchmarks on your own data before you come to a real conclusion about whether which performs the best (and if the difference is enough to care about, which I personally doubt....)
EDIT
Also, I did write a priority heap implementation using an array of primitive keys that may be of interest given the original poster mentioned primitive keys in the comment below:
This could probably be hacked into an n-ary version for benchmarking purposes relatively easily if anyone was interested in running a test.
Upvotes: 1