Foxy
Foxy

Reputation: 1099

How can I speed up my function that reads a file and returns its contents?

I am new to OCaml, but I have experience in F# and Haskell. I'm really surprised by the apparent lack of functionalities that seem elementary in a standard library, to illustrate, I just want to read the content of a file so that the text can then be parsed (several times). There doesn't seem to be any function that returns the content of a file (there is In_Channel.read_all, but this is part of the Janes Street library, which is not cross-platform and therefore I don't want to use it).

So I implemented my function with what the standard lib offers, but on the one hand I really don't find it very idiomatic and on the other hand it's very slow, so I wonder how I could make it more efficient or better: if there is no other more efficient way to do what I want.

Here is the function:

let read_file filename =
    let res = ref "" in
    let read_contents = open_in filename in
    try while true
        do res := input_line read_contents ^ !res ^ "\n"
        done; !res
    with End_of_file -> close_in read_contents; !res

Moreover, if the file starts with new lines, the resulting string will not have taken them into account, which is a bit annoying, but not too serious in my case.

Upvotes: 0

Views: 91

Answers (2)

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66803

It's true, the OCaml standard library is quite sparse.

If you don't mind using Unix primitives (many of which also work on Windows) you can read a file with just one read call like this:

 let read_whole_file filename =
     let open Unix in
     let fd = openfile filename [O_RDONLY] 0o666 in
     let len = lseek fd 0 SEEK_END in
     ignore (lseek fd 0 SEEK_SET);
     let res = Bytes.make len '\000' in
     if read fd res 0 len <> len then
         failwith "partial read";
     close fd;
     res

Note that this returns the result as bytes (a mutable array of characters, in essence). You can convert to string if necessary. In recent OCaml versions strings are immutable (which is how they should be IMHO).

Update

I don't know how I missed these yesterday, but there are functions in the standard library that will do this. Here's a revised version in case it's useful:

 let read_whole_file filename =
     let chan = open_in_bin filename in
     let res =
         really_input_string chan (in_channel_length chan)
     in
     close_in chan;
     res

Note that this uses open_in_bin to avoid modification of line endings under Windows. This is necessary (I believe) to get agreement with the length returned by in_channel_length.

(It's still true that the OCaml standard library is pretty sparse.)

Upvotes: 2

octachron
octachron

Reputation: 18892

You can use containers and its read_all function as an extension to the standard library.

Concerning your function, it is accidentally quadratic because

    res := input_line read_contents ^ !res ^ "\n"

is reallocating a new string for each new line. It is better to use Buffer (or String.concat) when building a string by repeatedly appending small strings.

Upvotes: 4

Related Questions