V. Semeria
V. Semeria

Reputation: 3256

OCaml optimization of unfold followed by fold

There is a well-known compiler optimization of an unfold followed by a fold, which is called hylomorphism (in other words: a loop). Simply put, instead of building a structure and destroying it immediately after, you plug the folding function directly on the generator of the structure, which results in an in-place loop that does not create the structure in memory at all.

For example, to sum some numbers input by the user, we could do

let sumInput () : int =
  List.fold_left (+) 0
    (List.init 1875 (fun _ -> read_int ()))

The intermediate structure is the list that stores the input numbers, constructed by List.init. I would expect OCaml's native compiler, ocamlopt, to optimize this as a tail-recursive loop like

let optimSumInput () : int =
  let rec optimInput_rec count sum =
    if count = 0 then sum
    else optimInput_rec (count-1) (sum + read_int ()) in
  optimInput_rec 1875 0

However, ocamlopt -s produces this assembly code, where we see both camlStdlib__list__init and camlStdlib__list__fold_left:

camlTestOptim__sumInput_80:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_adjust_cfa_offset 8
.L102:
    movq    camlTestOptim__3@GOTPCREL(%rip), %rbx
    movq    $3751, %rax
    call    camlStdlib__list__init_204@PLT
.L100:
    movq    %rax, %rdi
    movq    $1, %rbx
    movq    camlTestOptim__2@GOTPCREL(%rip), %rax
    addq    $8, %rsp
    .cfi_adjust_cfa_offset -8
    jmp camlStdlib__list__fold_left_250@PLT
    .cfi_adjust_cfa_offset 8
    .cfi_adjust_cfa_offset -8
    .cfi_endproc
    .type camlTestOptim__sumInput_80,@function
    .size camlTestOptim__sumInput_80,. - camlTestOptim__sumInput_80

Is there a compiler switch that would optimize the list out ? In this particular example it won't accelerate much, but for more complicated structures than lists, it can make a difference.

Upvotes: 2

Views: 331

Answers (2)

ivg
ivg

Reputation: 35210

No, OCaml's optimizing compiler is not doing such optimizations and, probably, won't ever do. This answers the posted question. Below is my personal opinion that is not an answer but a comment that I think is necessary to give for the right understanding of the language and why such optimizations should not be expected.

The optimizing compiler of OCaml is known for its predictability and ability to efficiently compile functional programs to machine code. It will not, however, translate bad programs into good ones, as the compiler follows closely the semantics of the language and belives that programmers know what they are doing. Unlike Haskell, OCaml is a strict imperative language with stronger semantics. It gives a programmer much more control over the execution process than Haskell, but that comes with a price, of course, as certain optimizations are hard if not impossible to implement. In OCaml, all expressions are impure (effectful) by default, so in order to remove an expression and substitute it with its value, the compiler has to prove that no effects are lost in the process. Another important aspect, that we should keep in mind when coming from the Haskell territory, is that OCaml is strict and OCaml lists are also strict inductive data type. Therefore, while in Haskell with the call-by-name semantics and lazy list such optimizations could be seen as natural, in OCaml it would be too far-going and essentially will violate the contract between the programmer and the compiler.

Since OCaml gives us, the programmers, much more control on the execution process, we have lots of different data structures, which are mathematically isomorphic but offer different tradeoffs. And it is our task as a programmer to choose a representation that best reflects our needs and constraints. Given your concrete example, a choice of a list as an underlying data structure is a little bit weird and not justified. However, the compiler trusts the programmer, and if a programmer decided to use lists here, it will follow the request.

A seasoned programmer will choose a better data structure here, e.g., Core's Sequence.t or Lwt's stream, or just the standard Seq.t iterator,

let rec ints () = match read_int () with
  | exception End_of_file -> Seq.Nil
  | s -> Seq.Cons (s,ints)

and we can define a sum of integers as

let sum = Seq.fold_left (+) 0

therefore our main function will look like

let main () =
  print_int (sum ints);
  print_newline ()

This code will not create any intermediate lists so deforesting will be applied naturally, without us having to rely on obscure compiler optimizations.

This distinction with Haskell draws a line between programming and pure mathematics, as while different sorts of lists are mathematically equivalent there is a great variety of implementations that are different from the programmer's perspective.

Therefore, OCaml relies on programmers to make this choice and is not trying to be more clever than a human. On the other hand, it provides a very powerful abstraction mechanism, which is a reification of Sigma-algebras, that lets us program against abstractions and remain in the world of pure mathematics when we want and return back to Earth when it is absolutely necessary.

Upvotes: 7

coredump
coredump

Reputation: 38809

There is currently no such switch. You probably need to use a Stream Fusion library like Strymonas Streams (see OCaml implementation).

When the execution reaches the run method, an optimized version of the stream will be compiled (emitted) and executed [...]. The resulted code will consist of a tight, for-loop.

The library does not have access to the compiler's internal, which might limit the extent to which it is integrated with the language and how the bytecode or native code is emitted. But this also leave the core language simple and with a straightforward semantics.

As an optimization pass in the compiler, it would optimize some expressions but not others; that might be a little bit more difficult for users (developers) to guess ahead of time how the code might behave at runtime (e.g. see how confusing tail-call elimination can already be). It has also a cost as a compiler feature, in terms of maintenance (e.g. testing, forward/backward compatibility).

That's some possible reasons, along with the lack of resources, for which doing the optimization directly in the compiler is not currently done, and maybe not desirable in the first place.

Upvotes: 2

Related Questions