daniele3004
daniele3004

Reputation: 13970

OCaml: Count different value in list of pairs

I have a list of pairs

let myList=[(0,1);(0,2);(0,3);(1,5);(2,4);(3,5);(5,4);(5,6);(4,3)];;

For count each single distinct value present in list I have this procedure

let rec flat lst  visited =
match lst with
[]->visited
| (x,y)::xs -> flat xs (x::y::visited)) ;;


let newLst = flat myList [];;

val newLst : int list =
  [4; 3; 5; 6; 5; 4; 3; 5; 2; 4; 1; 5; 0; 3; 0; 2; 0; 1]

let rec count lista = 
match lista with  
[]->0
| x::xs -> 
if (List.mem x xs) then count xs
else 1+count xs;;

count newLst;;
- : int = 7

The code run correctly but my question is:

Is there a more elegant or efficent way to do this? For example a unique function and not two

Upvotes: 2

Views: 1294

Answers (4)

Michaël Le Barbier
Michaël Le Barbier

Reputation: 6478

Your method works, is simple, and easy to understand. Its only downside, is that your code uses the Shlemiel the painter's algorithm. Here this means, that processing time behaves as a quadratic function of the size of the list.

If you want to remove it, it is appropriate to use sets: add all the numbers in your list to a set and compute its size. Now the time performance is in n log(n) and scales much better.

let myList=[(0,1);(0,2);(0,3);(1,5);(2,4);(3,5);(5,4);(5,6);(4,3)]

module IntegerSet = Set.Make(struct
    type t = int
    let compare = Pervasives.compare
  end)

let count lst0 =
  let rec loop acc lst =
    match lst with
    | [] -> IntegerSet.cardinal acc
    | (a,b)::tl -> loop IntegerSet.(add b (add a acc)) tl
  in
  loop IntegerSet.empty lst0

This code uses an accumulator acc which is filled in by iterating over the list. When all the list has been read, the number of elements in the accumulator is returned.

Upvotes: 3

Reimer Behrends
Reimer Behrends

Reputation: 8740

Your solution is basically how you do it without extensively resorting to library functions (and at the cost of quadratic worst-case performance). You can use the functions in the List library to get a simpler solution, but while that is a bit simpler, it will mostly teach you how to use that library and less about OCaml as a language [1]. That said, here's a solution that does just that:

let myList=[(0,1);(0,2);(0,3);(1,5);(2,4);(3,5);(5,4);(5,6);(4,3)]

let count l =
  let open List in
  let (a, b) = split l in length (sort_uniq compare (a @ b))

let () =
  Printf.printf "=> %d\n" (count myList)

This uses List.split and the list append operator @ to turn a list of pairs of ints into a list of ints, then sorts it and removes duplicates (List.sort_uniq), then uses List.length to count the results. This runs in time O(n*log(n)) due to sort_uniq.

Alternative solutions are to use the Set or Hashtbl modules to keep track of duplicates in a more efficient way than List.mem, thus also avoiding quadratic worst-case time (but also making the code more complicated in the process).

[1] I am assuming here that you're in the process of learning OCaml, so an industrial strength solution is not necessarily the best one to help your learning process, depending on where you are.

Upvotes: 2

Pierre G.
Pierre G.

Reputation: 4441

I would not argue about the elegance neither... A different way of writing your code : use fold operation. Your flat function can be written this way :

let flat  = List.fold_left (fun acc (x,y) -> x::y::acc) [] ;;

Upvotes: 2

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66823

Elegance doesn't have a specific meaning, so it's hard to answer.

I think this is a reasonably nice way to solve the problem. If you imagine you had lots of different structures (lists of pairs, trees, etc.), the idea of translating to flat lists of ints then processing the lists in different ways has a nice feel.

One problem with your solution is that it's quadratic in the worst case, since you're searching lists of length 0, 1, 2, ... n * 2 for n pairs.

I suspect this isn't supposed to be production code, so the computational complexity might not matter.

If you were going to do this in production code where the lists were long and efficiency was important, you would do the counting directly on the list of pairs. And you wouldn't keep searching down the list for duplicates. You'd use some kind of set (maybe even a bit vector-like set) to track what you've seen. Very likely this is overkill for your intended use (it looks like a class assignment to me).

Upvotes: 1

Related Questions