G4143
G4143

Reputation: 2829

Split a list in two and preserve order

How do you efficiently split a list in 2, preserving the order of the elements?

Here's an example of input and expected output

[] should produce ([],[])
[1;] can produce ([1;], []) or ([], [1;])
[1;2;3;4;] should produce ([1; 2;], [3; 4;])
[1;2;3;4;5;] can produce ([1;2;3;], [4;5;]) or ([1;2;], [3;4;5;])

I tried a few things but I'm unsure which is the most efficient... Maybe there is a solution out there that I'm missing completely(calls to C code don't count).

My first attempt was to use List's partition function with a ref to 1/2 the length of the list. This works but you walk through the whole list when you only need to cover half.

let split_list2 l =
  let len = ref ((List.length l) / 2) in
  List.partition (fun _ -> if !len = 0 then false else (len := !len - 1; true)) l

My next attempt was to use a accumulator and then reverse it. This only walks through half the list but I call reverse to correct the order of the accumulator.

let split_list4 l =
  let len = List.length l in
  let rec split_list4_aux ln acc lst =
    if ln < 1
    then
      (List.rev acc, lst)
    else
      match lst with
      | [] -> failwith "Invalid split"
      | hd::tl ->
        split_list4_aux (ln - 1) (hd::acc) tl in
  split_list4_aux (len / 2) [] l

My final attempt used function closures for the accumulator and it works but I have no idea how efficient closures are.

let split_list3 l =
  let len = List.length l in
  let rec split_list3_aux ln func lst =
    if ln < 1
    then
      (func [], lst)
    else
      match lst with
      | hd::tl -> split_list3_aux (ln - 1) (fun t -> func (hd::t)) tl
      | _ -> failwith "Invalid split" in
  split_list3_aux (len / 2) (fun t -> t) l

So is there a standard way to split a list in OCaml(preserving element order) that's most efficient?

Upvotes: 0

Views: 495

Answers (3)

Chris
Chris

Reputation: 36680

Late answer, but offered as an option: solve this by reversing the list and then converting both the original and the reversed to sequences and taking the appropriate number of elements from each sequence. We can then reverse the elements taken from the reversed list and put them in a tuple.

Optimization: List.length and List.rev each require a complete traversal of the list. We can use List.fold_left to give us both in just one traversal.

let split_in_half lst =
  let (len, rev) = List.fold_left (fun (l, r) x -> (l+1, x::r)) (0, []) lst in
  let half_len = float_of_int len /. 2. in
  let first_half = 
    lst 
    |> List.to_seq 
    |> Seq.take (half_len |> Float.ceil |> int_of_float) 
    |> List.of_seq 
  in
  let second_half = 
    rev 
    |> List.to_seq 
    |> Seq.take (half_len |> Float.floor |> int_of_float) 
    |> List.of_seq 
    |> List.rev 
  in
  (first_half, second_half)

Upvotes: 1

G4143
G4143

Reputation: 2829

Couldn't give up this question. I had to find a way that I could walk through this list one time to create a split with order preserved..

How about this?

let split lst =
  let cnt = ref 0 in
  let acc = ref ([], []) in
  let rec split_aux c l =
    match l with
    | [] -> cnt := (c / 2)
    | hd::tl ->
      (
        split_aux (c + 1) tl;
        let (f, s) = (!acc) in
        if c < (!cnt)
        then
          acc := ((hd::f), s)
        else
          acc := (f, hd::s)

      )
  in
  split_aux 0 lst; !acc

Upvotes: 1

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66823

You need to traverse the whole list for all of your solutions. The List.length function traverses the whole list. But it's true that your later solutions re-use the tail of the original list rather than constructing a new list.

It is difficult to say how fast any given bit of code is going to be just by inspection. Generally it's good enough to think in aysmptotic O(f(n)) terms, then work on slow functions in detail through timing tests (of realistic data).

All of your answers look to be O(n), which is the best you can do since you clearly need to know the length of the list to get the answer.

Your split_list2 and split_list3 solutions look pretty complicated to me, so I would expect (intuitively) them to be slower. A closure is a fairly complicated data structure containing a function and the environment of accessible variables. So it's problaby not all that fast to construct one.

Your split_list4 solution is what I would code up myself.

If you really care about timings you should time your solutions on some long lists. Keep in mind that you might get different timings on different systems.

Upvotes: 2

Related Questions