Jay Mody
Jay Mody

Reputation: 4073

Tail recursive remove duplicate consecutive entries in list

I'm attempting problem 8 of the 99 OCaml problems, which asks you to write a function compress that removes consecutive duplicate entires in a list:

assert (
  compress
    [ "a"; "a"; "a"; "a"; "b"; "c"; "c"; "a"; "a"; "d"; "e"; "e"; "e"; "e" ]
  = [ "a"; "b"; "c"; "a"; "d"; "e" ])

I came to the following solution:

let head = function x :: _ -> Some x | [] -> None

let compress list =
  let rec fn old_list new_list =
    match (old_list, new_list) with
    | h :: t, _ ->
        fn t (if Some h = head new_list then new_list else h :: new_list)
    | _ -> new_list
  in
  List.rev (fn list [])
;;

The provided example solution by the website is as follows:

let rec compress = function
    | a :: (b :: _ as t) -> if a = b then compress t else a :: compress t
    | smaller -> smaller;;

At first, I thought that my solution was more efficient as it is tail recursive, while the provided solution is clearly not (requires us to keep the a in a :: compress t on the stack). However, when I test if my code is tail recursive:

assert (
  (compress [@tailcall])
    [ "a"; "a"; "a"; "a"; "b"; "c"; "c"; "a"; "a"; "d"; "e"; "e"; "e"; "e" ]
  = [ "a"; "b"; "c"; "a"; "d"; "e" ])

It gives me a warning saying it's not tail recursive. Why?

From my understanding, my solution doesn't require keeping any state on the stack, which should make it tail recursive.

EDIT Also tried applying the [@tailcall] to fn directly via List.rev ((fn [@tailcall]) list []), get the same warning.

Upvotes: 2

Views: 131

Answers (2)

Chris
Chris

Reputation: 36680

When you make your assertion, compress isn't in tail position, (=) is. This has no bearing on your compress function implementation.

# assert (
  ((=) [@tailcall]) 
    (compress ["a"; "a"; "b"]) 
    ["a"; "b"]
);;
- : unit = ()

Similarly, in the expression List.rev ((fn [@tailcall]) list []), the call of fn is not in the tail call position.

You can test this by trying:

let compress list =
  let rec fn old_list new_list =
    match (old_list, new_list) with
    | h :: t, _ ->
        (fn[@tailcall]) t (if Some h = head new_list then new_list else h :: new_list)
    | _ -> new_list
  in
  List.rev (fn list [])

Note also that tail-recursive does not always mean more efficient. Tail-recursive functions are often less efficient, but they can be used on large data without a stack overflow. If you are dealing with data likely to cause this, it suggests you may need to re-evaluate the data structure you're using.

We could also, as of OCaml 4.14 make compress tail-recursive with tail_mod_cons.

let[@tail_mod_cons] rec compress = 
  function
  | a :: (b :: _ as t) -> 
      if a = b then compress t 
      else a :: compress t
  | smaller -> smaller;;

Alternatively, you might implement this with continuation passing.

let compress lst =
  let rec aux k = function
    | ([] | [_]) as lst -> k lst
    | a::(b::_ as t) when a = b -> aux k t
    | a::t -> aux (fun i -> k (a :: i)) t
  in
  aux Fun.id lst

As yet another fun alternative that is tail-recursive, you might write a compress function which generates a sequence. In order to do this, our aux function needs to take an argument to keep track of the last value seen. Because at the beginning there will not be a last value seen, the option type makes sense.

# let compress lst =
    let rec aux lst last_seen () =
      match lst, last_seen with
      | [], _ -> Seq.Nil
      | x::xs, Some x' when x = x' -> aux xs last_seen ()
      | x::xs, _ -> Seq.Cons (x, aux xs (Some x)) 
    in
    aux lst None;;
val compress : 'a list -> 'a Seq.t = <fun>
# compress [1;1;1;3;3;4;6;6;7;4] 
  |> Seq.take 3 
  |> List.of_seq;;
- : int list = [1; 3; 4]

Of course, it may also be helpful to have this work directly on a sequence as the input, allowing it to work on other data types without first converting to a list. The translation is very straightforward.

# let compress_seq seq =
    let rec aux seq last_seen () =
      match seq (), last_seen with
      | Seq.Nil, _ -> Seq.Nil
      | Seq.Cons (x, xs), Some x' when x = x' -> aux xs last_seen ()
      | Seq.Cons (x, xs), _ -> Seq.Cons (x, aux xs (Some x)) 
    in
    aux seq None;;
val compress_seq : 'a Seq.t -> 'a Seq.t = <fun>
# [1;1;1;3;3;4;6;6;7;4] 
  |> List.to_seq
  |> compress_seq
  |> Seq.take 3
  |> List.of_seq;;
- : int list = [1; 3; 4]

Upvotes: 2

Jay Mody
Jay Mody

Reputation: 4073

Okay, so I figured it out.

1) Is it tail recursive?

To test if the functions were tail recursive, I decided to just try and break them by inducing a stack overflow:

let _ = compress (List.init 10_000_000 (fun x -> Some x))

For my implementation, this works just fine. The provided solution, on the other hand, results in a segvault, which I'm assuming is from stack overflow:

[1]    70387 segmentation fault  ./a.out

So we can conclude that my implementation is indeed tail recursive, while the other one is not.

2) Which one is faster?

I used the following to test the speed of both functions:

let _ =
  let l = List.init 250000 (fun x -> x) in
  let t = Sys.time () in
  let f = compress l in
  Printf.printf "Execution time: %fs\n" (Sys.time () -. t);
  f

Note, 250000 is the limit on my machine before things stack-overflowed.

For the tail recursive implementation, it was around 0.018s.

For the non-tail recursive implementation, it was around 0.013s.

So seems that the overhead of function calls is not enough to make the non-tail recursive implementation slower than the tail recursive one, which requires 2 passes of the list.

I should also be noted, this is also for the worst case, where our input list List.init 250000 (fun x -> x) is a list with all unique elements. The amount of stack space required for the non-tail recursive implementation is proportional to the number of unique elements, not the number of elements in the list, because in the left branch of if a = b then compress t else a :: compress t, we don't use any stack space. I tested this by changing the list to only hold a constant and made the list much larger List.init 100000000 (fun x -> 0), and the non-tail recursive implementation no longer seg faults. It seems the OCaml compiler is smart enough to know that if the left branch of if a = b then compress t else a :: compress t is tail recursive, and only allocates a stack if the right branch is hit.

Upvotes: 1

Related Questions