Reputation: 50
I want to choose n
distinct random elements in a list l
and return them in choose_elements
but I have a StackOverFlow
error for sufficiently large lists!
I tried to use a tail_recursive function choose_elem_aux
in order to do that but I think my condition List.mem
is not efficient enough in term of complexity!
I do this normally in other programming languages with a boolean mark array which I mark the index of each random number that is generated in true
!
But I can't do this in OCaml because I can't do multiple instruction into an if or else block! like this:
... else {
mark[r] =true ;
choose_elem_aux l n mark tmp ;
} ...
let choose l =
nth l (Random.int (List.length l)) ;;
let rec choose_elem_aux l n tmp =
if n=0 then tmp
else
let r=choose l in
if List.mem r tmp then
choose_elem_aux l n tmp
else
choose_elem_aux l (n-1) (r::tmp) ;;
let rec choose_elements l n =
choose_elem_aux l n [] ;;
StackOverflow for large list like:
choose_elements [1...10_000] 7 ;;
Upvotes: 0
Views: 323
Reputation: 36536
For efficiency I would suggest sorting the list first using a random int value from -1
to 1
(this has the effect of shuffling the list, though there are other means to accomplish this goal) and then selecting the first n
elements from that sorted list.
let choose n l =
l
|> List.sort (fun _ _ -> Random.int 3 - 1)
|> List.to_seq
|> Seq.take n
|> List.of_seq
If you need them randomly selected but in the same order they were in initially, you can map indexes onto them in the form of tuples, sort randomly, select them, and then re-sort based on the indexes. Finally, you would use List.map
to discard the no longer needed indexes.
let choose n l =
l
|> List.mapi (fun i x -> (i, x))
|> List.sort (fun _ _ -> Random.int 3 - 1)
|> List.to_seq
|> Seq.take n
|> List.of_seq
|> List.sort (fun (i, _) (j, _) -> compare i j)
|> List.map snd
What you're doing is like picking cards randomly from a 52 card deck and checking to see if it's one you've already drawn, then putting the card back into the deck and drawing again.
Not only might this result in numerous pointless draws as you draw the same card again and again, but each time you run your function you're A. iterating over the entire list to find its length, and B. iterating over your entire results accumulator to determine whether you've already drawn a card.
This yields an O(n2) algorithm, which scales poorly for large samples. Or it yields something worse than O(n2) because it's indeterminate. You might get that value you haven't picked yet after two tries... or after a thousand. And as you select more values, the likelihood of randomly drawing a value that hasn't previously been randomly drawn increases. Add in that lists are not built for efficient random access and this is just a bad idea over all.
Sorting can be done in O(n * log n) complexity, and everything else involved has linear runtime complexity, so the sort/shuffle first approach scales much better. For instance, 10,000 squared is 100,000,000, but 10,000 times log210,000 is approximately 130,000.
Array.shuffle
As of OCaml 5.2 there is the Array.shuffle
function in the standard library. You may find it more efficient to convert your list to an array, shuffle the array, and then select the first N elements as a list.
Upvotes: 1
Reputation: 50
Thanks to @théo-winterhalter I just to it in this way:
let rec choose_elem_aux l n mark tmp =
if n = 0 then
tmp
else
let r = Random.int (length l) in
if mark.(r) then
choose_elem_aux l n mark tmp
else (
mark.(r) <- true;
choose_elem_aux l (n-1) mark ((nth l r)::tmp);
);;
let rec choose_elements l n =
let mark = Array.make (length l) false in
choose_elem_aux l n mark [];;
Upvotes: 0
Reputation: 5108
First of all, ocaml does allow you to write several statements in an if then else
, it's just not written as in C-like languages.
if condition then begin
instruction1 ;
instruction 2 ;
(* ... *)
end
else begin
(* ... *)
end
The begin (* ... *) end
block works the same as parentheses, so you can also just parenthesise:
if condition then (
instruction1 ;
instruction 2 ;
(* ... *)
)
else (
(* ... *)
)
So you can do your optimisation just fine.
What happens here is that when writing if b then t else f
in ocaml, you are building a value of type T
if t : T
and f : T
.
You can for instance write if b then 0 else 1
or if b then "Hello" else "Goodbye"
.
It also works with the unit
type (the type of most instructions):
if b then instruction1 else instruction2
The semicolon operator allows to do two instructions sequentially:
(;) : unit -> unit -> unit
Note that it is not the same as in most language where it marks the end of an instruction.
The problem is that when you write
if b then instruction1 else instruction2 ; instruction 3
it is not understood as
if b then instruction1 else (instruction2 ; instruction 3)
as you wished, but as
(if b then instruction1 else instruction2) ; instruction 3
which also makes sense because the if
expression also has type unit
.
Upvotes: 1