Edwin Ansari
Edwin Ansari

Reputation: 50

Just want choose some random elements of a list and return them in OCaml

I want to choose n distinct random elements in a list l and return them in choose_elements but I have a StackOverFlow error for sufficiently large lists!

I tried to use a tail_recursive function choose_elem_aux in order to do that but I think my condition List.mem is not efficient enough in term of complexity! I do this normally in other programming languages with a boolean mark array which I mark the index of each random number that is generated in true! But I can't do this in OCaml because I can't do multiple instruction into an if or else block! like this:

... else {
mark[r] =true ;
choose_elem_aux l n mark tmp ;
} ...


let choose l = 
  nth l (Random.int (List.length l)) ;;
  
let rec choose_elem_aux l n tmp =
  if n=0 then tmp
  else
    let r=choose l in 
    if List.mem r tmp then
      choose_elem_aux l n tmp 
    else 
      choose_elem_aux l (n-1) (r::tmp) ;;

let rec choose_elements l n = 
  choose_elem_aux l n [] ;;

StackOverflow for large list like:

choose_elements [1...10_000] 7 ;;

Upvotes: 0

Views: 323

Answers (3)

Chris
Chris

Reputation: 36536

For efficiency I would suggest sorting the list first using a random int value from -1 to 1 (this has the effect of shuffling the list, though there are other means to accomplish this goal) and then selecting the first n elements from that sorted list.

let choose n l = 
  l 
  |> List.sort (fun _ _ -> Random.int 3 - 1) 
  |> List.to_seq 
  |> Seq.take n 
  |> List.of_seq

If you need them randomly selected but in the same order they were in initially, you can map indexes onto them in the form of tuples, sort randomly, select them, and then re-sort based on the indexes. Finally, you would use List.map to discard the no longer needed indexes.

let choose n l = 
  l 
  |> List.mapi (fun i x -> (i, x))
  |> List.sort (fun _ _ -> Random.int 3 - 1) 
  |> List.to_seq 
  |> Seq.take n 
  |> List.of_seq
  |> List.sort (fun (i, _) (j, _) -> compare i j)
  |> List.map snd

Complexity

What you're doing is like picking cards randomly from a 52 card deck and checking to see if it's one you've already drawn, then putting the card back into the deck and drawing again.

Not only might this result in numerous pointless draws as you draw the same card again and again, but each time you run your function you're A. iterating over the entire list to find its length, and B. iterating over your entire results accumulator to determine whether you've already drawn a card.

This yields an O(n2) algorithm, which scales poorly for large samples. Or it yields something worse than O(n2) because it's indeterminate. You might get that value you haven't picked yet after two tries... or after a thousand. And as you select more values, the likelihood of randomly drawing a value that hasn't previously been randomly drawn increases. Add in that lists are not built for efficient random access and this is just a bad idea over all.

Sorting can be done in O(n * log n) complexity, and everything else involved has linear runtime complexity, so the sort/shuffle first approach scales much better. For instance, 10,000 squared is 100,000,000, but 10,000 times log210,000 is approximately 130,000.

Array.shuffle

As of OCaml 5.2 there is the Array.shuffle function in the standard library. You may find it more efficient to convert your list to an array, shuffle the array, and then select the first N elements as a list.

Upvotes: 1

Edwin Ansari
Edwin Ansari

Reputation: 50

Thanks to @théo-winterhalter I just to it in this way:

let rec choose_elem_aux l n mark tmp =
  if n = 0 then 
    tmp
  else
    let r = Random.int (length l) in 
    if mark.(r) then
      choose_elem_aux l n mark tmp 
    else (
      mark.(r) <- true;
      choose_elem_aux l (n-1) mark ((nth l r)::tmp);
    );;

let rec choose_elements l n = 
  let mark = Array.make (length l) false in
  choose_elem_aux l n mark [];;

Upvotes: 0

Th&#233;o Winterhalter
Th&#233;o Winterhalter

Reputation: 5108

First of all, ocaml does allow you to write several statements in an if then else, it's just not written as in C-like languages.

if condition then begin
  instruction1 ;
  instruction 2 ;
  (* ... *)
end
else begin
  (* ... *)
end

The begin (* ... *) end block works the same as parentheses, so you can also just parenthesise:

if condition then (
  instruction1 ;
  instruction 2 ;
  (* ... *)
)
else (
  (* ... *)
)

So you can do your optimisation just fine.

What happens here is that when writing if b then t else f in ocaml, you are building a value of type T if t : T and f : T. You can for instance write if b then 0 else 1 or if b then "Hello" else "Goodbye". It also works with the unit type (the type of most instructions):

if b then instruction1 else instruction2

The semicolon operator allows to do two instructions sequentially:

(;) : unit -> unit -> unit

Note that it is not the same as in most language where it marks the end of an instruction.

The problem is that when you write

if b then instruction1 else instruction2 ; instruction 3

it is not understood as

if b then instruction1 else (instruction2 ; instruction 3)

as you wished, but as

(if b then instruction1 else instruction2) ; instruction 3

which also makes sense because the if expression also has type unit.

Upvotes: 1

Related Questions