Jackson Tale
Jackson Tale

Reputation: 25812

How to implement a binary heap using list in OCaml?

I am implementing a binary heap using list in OCaml, just to sharpen my OCaml skills.

I feel it very difficult using list and after struggling for 2 days, I have to come here for suggestions and hints.


Here is my thought so far

Obviously, I can't use the orignal array based algorithm to implement it using list.

What I am trying to utilise is binary tree. I have keep the invariant that a node should be bigger than any node whose level is lower than its.

I roughly figured out how to implement insert, although I am not sure whether it is correct or not.

For the binary tree, each node has two children, value and size n which is the total number of offsprings it has. This n is used to balance the tree.

When inserting x, I compare with a node (from root, recursively). Assume x < the value of the node, then

If one or both of the node's children are Leaf, then I insert the x to that Leaf place.

If none of the node's children are Leaf, then I will choose the child whose n is less and then recursively insert.


Here is my code

type 'a heap = 
  | Node of 'a * 'a heap * 'a heap * int
  | Leaf 

exception EmptyHeapException

let create_heap () = Leaf;;

let rec insert x = function
  | Leaf -> Node (x, Leaf, Leaf, 0)
  | Node (v, l, r, n) ->
    let (stay, move) = if x > v then (x, v) else (v, x)
    in 
    match (l, r) with 
      | (Leaf, Leaf) -> 
        Node (stay, Node (move, Leaf, Leaf, 0), Leaf, 1)
      | (Leaf, _) -> 
        Node (stay, Node (move, Leaf, Leaf, 0), r, n+1)
      | (_, Leaf) ->
        Node (stay, l, Node (move, Leaf, Leaf, 0), n+1)
      | (Node (_, _, _, n1), Node (_, _, _, n2)) ->
        if n1 <= n2 then
          Node (stay, (insert move l), r, n1+1)
        else 
          Node (stay, l, (insert move r), n2+1);;

Ok, I have following questions.

  1. Am I heading to the correct direction? Is my thought or implementation correct?
  2. I get stuck in implementing get_top function. I don't know how to continue. any hints?
  3. ocaml batteries implemented an efficient batHeap.ml. I have had a look, but I feel its way is totally different from mine and I can't understand it. Any one can help me understanding it?

Upvotes: 2

Views: 2669

Answers (1)

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66818

This insertion code looks pretty nice to me. (I was confused by the counts for a while, but now I see they're counting the number of offspring.)

The function to remove the largest element (the root) is basically a deletion, which is always the most difficult. In essence you need to merge two trees while maintaining your invariant. I don't have time right now to work through it in detail, but I think it will turn out to be possible.

If you look in Okasaki (which you can do if you get stuck!) you'll see his trees have an extra invariant that makes it easier to do these operations. I'm pretty sure it's not something I would come up with right away. His implementation is based on an operation that merges two trees. It's used for insertion and deletion.

At a quick glance the Batteries heap code is based on "binomial trees", which are in fact a lot more complicated. They're explained in Okasaki also.

Update

Okasaki's book Purely Functional Data Structures is an elaboration of his PhD thesis. It appears that priority queues appear only in the book--sorry. If you're really interested in FP and not too strapped for cash the book is really worth owning.

As I said, your insert code looks great to me. It seems to me you actually have two invariants:

  • The value in a node is less than or equal to the values at the roots of its subtrees (ordering invariant).

  • The populations of the subtrees of a node differ by at most 1 (balance invariant).

As I said, I don't have time to verify in detail, but it looks to me like your insert code maintains the invariants and thus is O(log n).

The usefulness of this structure depends on your being able to delete the root in O(log n) while maintaining these two invariants.

The sketch of delete would be something like this:

let pop = function Leaf -> 0 | Node (_, _, _, p) -> p

let rec merge a b =
  (* populations of a, b differ by at most one. pop a >= pop b *)
  match a, b with
  | Leaf, Leaf -> Leaf
  | Leaf, _ -> b
  | _, Leaf -> a
  | Node (av, al, ar, ap), Node (bv, bl, br, bp) ->
      if av >= bv then Node (av, merge al ar, b, ap + bp)
      else Node (bv, merge al ar, insert av (delete_min b), ap + bp)

and delete_min = function
  | Leaf -> Leaf
  | Node (_, Leaf, Leaf, _) -> Leaf
  | Node (_, l, Leaf, _) -> l
  | Node (_, Leaf, r, _) -> r
  | Node (_, l, r, _) ->
    if pop l >= pop r then merge l r else merge r l

I still don't have a lot of time, so this might need some fixing up for correctness or for complexity.

Update

Being a purely cerebral guy, I (truly) never wondered what Chris Okasaki is like in real life. He teaches at West Point, and it's not too difficult to find his personal page there. It might satisfy some of your curiosity.

Upvotes: 3

Related Questions