OCamlcstub performance

Question

Does using multiple call to external functions written in C affects the performance of an OCaml program?

For instance, let's assume that I want to create a function which creates a float list using the previous value in the list to compute the next iteration. For some reason, I wish this function to come from a cstub.

Does it make any difference in term of performance whether I write everything in C or if I mix C external functions with my OCaml code?

I guess this question is related to what actually happens when compiling:

ocamlc -o hello.byte -c hello.cma cstub.o

Said differently, is there any material difference between doing:

external next_iter: float -> float = "next_iter"
let make_list n first_val =
    let rec aux acc current_val n =
        if n = 0 then (* I assume that n will never be <0 *)
            acc
        else
            let new_val = next_iter current_val in
            aux (new_val :: acc) new_val (n-1) in
aux [] first_val n

and

external make_list: float -> int -> float list = "make_list"
(* Full implementation in C *)

Side question, if my cstub looks like this:

#include 

CAMLprim value add_3(value x)
{
    int i = Int_val(x);
    return Val_int(x+3);
}

Is the location of the returned value shared with the OCaml code or does OCaml reallocate a new part of the memory before using the value?

I am asking because I expect the second option to be especially inefficient when using the make_list solution from cstub.c for a large list (if it is implemented this way).

ivg · Accepted Answer

In general, calling a C function from OCaml has some small constant overhead. First of all, C calling conventions usually differ from the OCaml calling convention and are less efficient. When a C function is called, a compiler needs to store some registers that might be clobbered by the call, as well as it needs to restore them afterward. Also, if a C function allocates values in the OCaml heap (that is assumed by default) the call is wrapped by code that setups and clears garbage collector roots. If your function doesn't allocate, then you may mark its external specification with the [@@noalloc] attribute to remove unnecessary GC setup. Finally, OCaml compiler can't inline (obviously) your external calls, so some optimization opportunities are missed, like code specialization and allocation elimination. To form this in numbers, the call wrapping code is usually about 10 extra assembly instructions. Thus if your C function is compatible in size, then the overhead might be significant, so you may consider either make the call non-allocatable or consider rewriting it in OCaml. But in general, C functions are much bigger thus the overhead is negligible. As a final note, OCaml is not Python and is very efficient, so there is rarely or never a need to reimplement some algorithm in C. The external interface is mostly used for calling existing libraries, that are not available in C; invoking system calls; calling high-performance mathematical libraries and so on.

Side question

Is the location of the returned value shared with the OCaml code or does OCaml reallocate a new part of the memory before using the value?

In your example, the returned value is an immediate value and it is stored in a CPU register, i.e., it is not allocated. Int_val and Val_int are simple macros that translate between the C int representation and the OCaml int representation, i.e., shifts a value to the left and sets the least significant bit.

But in general, if a value is allocated with caml_alloc and friends, then the value is allocated in the OCaml heap and is not copied (unless GC is performing moving for its own purposes).

OCamlcstub performance

Answers (1)

Side question

Related Questions