lllllllllllll
lllllllllllll

Reputation: 9110

Optimize a file writing operation in OCaml?

basically in my project, I am trying to write a list of strings into file like this:

val mutable rodata_list : (string*string) list = []
.....
let zip1 ll =
   List.map (fun (h,e) -> h^e) ll in
let oc = open_out_gen [Open_append; Open_creat] 0o666 "final_data.s" in
  List.iter (fun l -> Printf.fprintf oc "%s\n" l) (zip1 rodata_list);  

Here is my problem, usually the rodata_list can reach as long as 800,000 size, and the above code on our server (64-bit, 32 core Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz) takes about 3.5 seconds.. The OCaml version I use is 4.01.0.

This is not acceptable, especially as I have 4 piece of code like this to write into a file. Totally they could take me over 15 seconds..

I tried this:

Printf.fprintf oc "%s\n" (String.concat "\n" (zip1 rodata_list));

But no obvious improvement..

So I am wondering that, how to optimize this part? I appreciate any solutions. Thank you!

Upvotes: 0

Views: 224

Answers (1)

ivg
ivg

Reputation: 35210

  1. Don't use ^ to concatenate a bunch of strings in performance critical code, as it will lead to quadratic complexity;
  2. Try not to rely on *printf functions, when performance matters (although in OCaml 4.02 it is pretty fast);
  3. Don't apply several iterations on a list in a row, since OCaml doesn't have a deforesting. Try to do as much operations in an iteration as possible;
  4. If you're using lists of 1 million elements, then you're actually doing something wrong. Try to use different data structure;

So, given the advices above we have the following:

List.iter (fun (x,y) -> 
  output_string oc x;
  output_string oc y;
  output_char oc '\n') rodata_list

Also, any optimizations should start from profiling, to get the profile you need to compile it with profiling info, for example like this:

 ocamlbuild myprogram.p.native

Then you can run program to collect the profile, that can be read with gprof. My guess, that you will spend all the time not in the actual IO, or even concatenation, but in garbage collection, since your zip, will create millions of string.

How fast it should be

So to proof, that you're actually trying to optimize wrong part of your code, I've wrote this small program:

let rec init_rev acc = function
  | 0 -> acc
  | n -> init_rev (("hello", "world") :: acc) (n-1)

let () = List.iter (fun (x,y) -> 
  print_string x;
  print_endline y) (init_rev [] 1000_000)

It creates a list of one million elements and outputs it:

$ ocamlbuild main.native 
$ time ./main.native > data.txt

real    0m0.998s
user    0m0.211s
sys     0m0.783s

This is on macbook laptop. Moreover we spend most of the time in the system, with only 200ms in OCaml. And a simple loop for 1000_000 iterations without creating a list, takes only 11ms.

So, profile.

Upvotes: 6

Related Questions