Locus
Locus

Reputation: 179

Reading all characters in OCaml is too slow

I'm a beginner with OCaml and I want to read lines from a file and then examine all characters in each line. As a dummy example, let's say we want to count the occurrences of the character 'A' in a file.

I tried the following

open Core.Std

let count_a acc string = 
    let rec count_help res stream =
        match Stream.peek stream with
        | None -> res
        | Some char -> Stream.junk stream; if char = 'A' then count_help (res+1) stream else count_help res stream
    in acc + count_help 0 (Stream.of_string string)

let count_a = In_channel.fold_lines stdin ~init:0 ~f:count_a

let () = print_string ((string_of_int count_a)^"\n"

I compile it with

 ocamlfind ocamlc -linkpkg -thread -package core -o solution solution.ml

run it with

$./solution < huge_file.txt

on a a file with one million lines which gives me the following times

real    0m16.337s
user    0m16.302s
sys 0m0.027s

which is 4 times more than my python implementation. I'm fairly sure that it should be possible to make this go faster, but I how should I go about doing this?

Upvotes: 1

Views: 341

Answers (1)

ivg
ivg

Reputation: 35210

To count the number of A chars in a string you can just use String.count function. Indeed, the simpliest solution will be:

open Core.Std

let () =
  In_channel.input_all stdin |>
  String.count ~f:(fun c -> c = 'A') |>
  printf "we have %d A's\n"

update

A slightly more complicated (and less memory hungry solution), with [fold_lines] will look like this:

let () =
  In_channel.fold_lines stdin ~init:0 ~f:(fun n s ->
    n + String.count ~f:(fun c -> c = 'A') s) |>
    printf "we have %d A's\n"

Indeed, it is slower, than the previous one. It takes 7.3 seconds on my 8-year old laptop, to count 'A' in 20-megabyte text file. And 3 seconds on a former solution.

Also, you can find this post interesting, I hope.

Upvotes: 3

Related Questions