Reputation: 1955
I understand (or at least I think I do) heapsort. I'm looking in a book I have for algorithms and came across this problem which I'm having trouble understanding. They are using a max heap for this:
LARGE is the bigger half of numbers to be sorted. I.E. LARGE = { n/2, n/2 + 1, ... n }
Initially, how many items in LARGE can be at the bottom level of the heap, after build heap phase of HeapSort?
Initially, how many items in LARGE can be within the top (log n)/4 levels of the heap, after build heap phase of HeapSort?
Let X be the items in LARGE which are neither in the bottom level initially nor within the top (log n)/4 levels initially. At least how many items are in X?
After build heap, aren't you supposed to heapify? Which would move all the larger elements up the tree? So none of the elements in LARGE would be at the bottom level correct? Maybe they are talking about after build heap and before heapify, which would mean at most half of LARGE elements could be in the bottom level of the heap? I'm mainly just looking for any clarifications on the question.
Sorry about the edit, I wasn't sure if it was allowed to ask a new question based on the remaining questions I have.
Upvotes: 1
Views: 800
Reputation: 51316
[EDIT: Well it seems you've actually deleted the part of your question that my answer answers! Hope it's still useful.]
pjs's comment is a good concrete example to get you started. Apologies for the wall of text that follows; perhaps there's a dramatically simpler way to analyse the problem, but I don't know it!
More generally, we can approach this mathematically by supposing that some element from LARGE is in the bottom level, and then looking at what implications this has. If these implications generate a contradiction, then it's impossible for the supposed thing to occur; but if they don't, then the implications we discover might give us a way to construct examples. (Or it might mean that we haven't looked hard enough for a contradiction; but in this case, because of pjs's example, we already know that they can't produce one. Still, making assumptions and then looking for contradictions is a good way to start out if no concrete example comes to mind immediately.)
So, let's assume that some element x in LARGE is a leaf in the heap. What does this imply? In other words, in what ways does this constrain the rest of the heap? First, the heap property in a max-heap tells us that a parent can never be smaller than its child. So x's parent y must be no smaller than x, and y's parent z must be no smaller than y, and so on up to the root. How many nodes are there on this path from x to the root? For a full heap (i.e. with n = 2^k-1 nodes, for some k), there will be exactly log2(n+1)-1, not including x itself. (For non-full heaps, it's slightly more complicated, but let's not worry about that here.) So, we can certainly conclude that x cannot be one of the largest log2(n+1)-1 elements, because if it was, there would not be enough other, larger elements to place in the chain of ancestors up to the root. (In fact, it's (again) slightly more complicated, due to the fact that a heap can contain duplicate elements. But the above holds for heaps of unique elements, and it doesn't hurt to consider just these initially.)
So, let's try making x the (log2(n+1))th-largest element, and putting all larger elements in its chain of ancestors. If log2(n+1) < n/2, then x will be a member of LARGE: this is certainly true for all n >= 7. (Considering only full heaps, this bound is tight: for n = 3, it's clearly not possible for the single element in LARGE to appear anywhere but at the root.) Is it possible to continue assigning the remaining elements to places in this heap until it's complete? Yes!
Picture x's chain of ancestors as a /
-shaped chain with the root at the top-right and x at the bottom-left, i.e. every node in this chain (except x itself) has a left child, but no right child. We need to slot the remaining elements into subtrees (sub-heaps, in fact) leading from those right children. The right child of the root needs to become a full sub-heap containing 2^(k-1)-1 elements; the right child of the root's left child needs to become a full sub-heap containing 2^(k-2)-1 elements; and so on. Notice in particular that these log2(n+1)-1 sub-heaps are completely independent from each other: they don't constrain each other in any way. Because there are no elements remaining that are larger than any of the elements in the /
-shaped chain, it's impossible for us to accidentally violate the max-heap property by picking a child that is larger than its parent on this chain, so we don't need to worry about that. We can simply take any old set of 2^(k-1)-1 of the remaining elements, build a heap from them, and attach that as a right child of the root; then take any old set of 2^(k-2)-1 of the elements that remain after that, build a heap from them, and attach that as a right child of the root's left child; and so on until we have built a complete heap on 2^k-1 elements.
So it's certainly possible to construct a heap with one element from LARGE on the bottom, provided there are at least 7 elements. But how many elements from LARGE can we fit there while still satisfying the max-heap property? I won't go into too much detail (partly because I haven't thought it through completely myself!) but here are some ideas:
Suppose we want to have 2^r elements of LARGE on the bottom row. One way to approach this is by making them all leaves in a height-r sub-heap, which must contain 2^(r+1)-1 elements in total, of which 2^r-1 will be non-leaves. (The 2^r leaves must somehow be arranged into 2^(r-1) pairs that have parents, which must themselves be arranged into 2^(r-2) pairs that have parents, and so on; summing up the element counts at all levels up to the root gives 2^(r+1)-1).
The nodes in this sub-heap must, of course, obey the max-heap property. One simple way to guarantee this is to sort all 2^(r+1)-1 nodes, take the smallest 2^r of them, and assign them (in any order) to leaves; then take the next smallest 2^(r-1) of them, and assign them (again in any order) to the parents of these leaves; and so on.
We still need to take care of the fact that the root of this sub-heap needs to be smaller than all of its ancestors in the chain up to the root of the original, complete heap on n elements. It's similar to the previous situation with just a single LARGE leaf, but the root of our sub-heap is r levels higher than x was back there, so we only need log2(n+1)-r-1 ancestors (instead of log2(n+1)-1) in this chain. Therefore, to get 2^r given elements into the bottom row of the complete heap using this construction, we need "machinery" consisting of 2^r-1 larger elements to fill out the sub-heap above them, plus log2(n+1)-r-1 even larger elements for the chain up to the root, for a total of 2^r+log2(n+1)-r-2 larger elements. Since we want as many of these 2^r elements as possible to be in LARGE, it makes sense to use the largest possible elements: take the overall largest 2^r+log2(n+1)-r-2 elements to use for the machinery, and then take the next-largest 2^r elements to be the leaves. Provided r is small enough, all or some of these leaves will be in LARGE. (Notice that for r = 0, we recover the original single-LARGE-leaf construction.) To find out how many elements from LARGE we can get onto that bottom row using this technique, keep increasing r until one or more of the 2^r leaves is no longer in LARGE; either the number of LARGE leaves in this iteration or in the previous iteration is maximal, since increasing r further will just result in even more LARGE elements being spent on machinery.
The above is a way to construct a heap with m "quite large" elements on the bottom. It may or may not be the optimal way to fit as many LARGE elements on the bottom as possible, and if it is, this remains to be proven. It may well not be, even under the various simplifying assumptions we've made (full heap, distinct elements), since we have actually constructed something that satisfies stronger constraints than are required of a heap (specifically, in the sub-heap that has the 2^r elements as its leaves, our construction ensures that every parent is larger than every leaf, even though this is not required of a heap).
In fact, I can think of one improvement already. When arranging 2^(r+1)-1 elements into a sub-heap, instead of assigning the 2^r smallest as leaves, we can sneak a few larger leaves in by doing the following: take the 3 smallest elements, make the 2 smallest of these leaves, and make the 3rd-smallest their parent. Repeat this procedure with the 4th- through 6th-smallest elements to produce another 3-element heap. Now take the 7th-smallest element to be the parent of these 2 3-element heaps, and so on. This works better because it smuggles a few (at least 2^r/3) of the smallest elements higher up into the sub-heap, meaning that more large (and more LARGE) elements will make it into the 2^r elements on the bottom row. I suspect it's optimal, since neither the smallest nor the 2nd-smallest element can be a non-leaf in any full sub-heap.
Upvotes: 1