m3nthal
m3nthal

Reputation: 433

How to efficiently read thousand of lines from STDIN in Erlang?

I've stumbled upon an issue when reading thousands of lines from STDIN. This would have been an imaginary edge case until I found out that some tests for this problem require reading thousand of lines from STDIN. At first I thought that my algorithms were not optimal, and only by accident I've found out that only reading lines without any computations could make half of the test time out.

Here is part code that times out:

process_queries(0, _) -> ok;
process_queries(N, A) -> 
    case io:fread("", "~s~d~d") of
        {ok, _} -> process_queries(N - 1, A)
        %{ok, ["Q", L, R]} -> process_queries(N - 1, apply_q(L, R, A));
        %{ok, ["U", Idx, Val]} -> process_queries(N - 1, apply_u(Idx, Val, A))
    end
.

I deliberately left comments to show that all the computations were disabled. So this code timed out given N=7984.

Is there a better way of reading and processing thousands lines from STDIN in Erlang?

Upvotes: 1

Views: 692

Answers (2)

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

There is an excerpt from my solution. There are few tricks how to do it really efficient.

read_command(CP) ->
    {ok, Line} = file:read_line(standard_io),
    [C, A, B] = binary:split(Line, CP, [global, trim_all]),
    {case C of <<"Q">> -> 'Q'; <<"U">> -> 'U' end,
     binary_to_integer(A),
     binary_to_integer(B)}.

read_commands(N, CP) ->
    [ read_command(CP) || _ <- lists:seq(1, N) ].

execute(Array, L) ->
    lists:foldl(fun({'Q', F, T}, A) ->
                        {Val, A2} = query(A, F, T),
                        file:write(standard_io, [integer_to_binary(Val), $\n]),
                        A2;
                   ({'U', I, V}, A) ->
                        update(A, I, V)
                end, Array, L).

read_int_line(CP) ->
    {ok, Line} = file:read_line(standard_io),
    [binary_to_integer(X) || X <- binary:split(Line, CP, [global, trim_all])].

main() ->
    ok = io:setopts([binary]),
    CP = binary:compile_pattern([<<" ">>, <<$\n>>]),
    [N] = read_int_line(CP),
    L = read_int_line(CP),
    N = length(L),
    [K] = read_int_line(CP),
    execute(init(L), read_commands(K, CP)).

You have to write your own init/1, update/3 and query/3 of course.

Upvotes: 1

Dogbert
Dogbert

Reputation: 222268

I'd suggest switching stdio to binary and then using io:get_line. Your data's format is pretty simple to parse by splitting on whitespace and converting two values to integers. The following code runs ~10 times faster than your code for me in a simple benchmark. I used escript to benchmark, which means it's highly likely that the difference is actually more than 10 times since escript parses and compiles the code on the fly.

process_queries_2(0, _) -> ok;
process_queries_2(N, A) -> 
    Line = io:get_line(""),
    [X, Y0, Z0, _] = binary:split(Line, [<<$\s>>, <<$\n>>], [global]),
    Y = binary_to_integer(Y0),
    Z = binary_to_integer(Z0),
    % use X, Y, Z
    process_queries_2(N - 1, A).

Here's the code I used to benchmark:

main(["1"]) ->
  ok = io:setopts(standard_io, [binary]),
  process_queries(10000, {});
main(["2"]) ->
  ok = io:setopts(standard_io, [binary]),
  process_queries_2(10000, {}).%
$ time yes 'Q 100 200' | escript a.erl 1
yes 'Q 100 200'  4.64s user 0.11s system 93% cpu 5.063 total
escript a.erl 1  4.67s user 1.44s system 120% cpu 5.062 total
$ time yes 'Q 100 200' | escript a.erl 2
yes 'Q 100 200'  0.36s user 0.01s system 77% cpu 0.466 total
escript a.erl 2  0.40s user 0.10s system 106% cpu 0.464 total

The reason for the speedup is that Erlang Strings are linked lists, which are very inefficient both for CPU time and Memory usage compared to binaries, which is a sequential chunk of memory.

Upvotes: 1

Related Questions