Jenia Be Nice Please
Jenia Be Nice Please

Reputation: 2693

Why does my escript crash the interpreter?

I have a tiny program that reads a csv file (100M). The problem is that my program makes the Erlang interpreter crash:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot reallocate 3563526520 bytes of memory (of type "heap").
Aborted

Here is the program:

readlines(FileName) ->
    {ok, Device} = file:open(FileName, [read]),
    try get_all_lines(Device)
      after file:close(Device)
    end.


get_all_lines(Device) ->
    case io:get_line(Device, "") of
        eof -> [];
        Line -> [Line | get_all_lines(Device)]
    end.

And I do:

Path="...csv".
Lines=tut6:readlines(Path).

And this procudes a crash.

Can someone please tell me what the problem is? Maybe something is wrong with my program? How can I avoid the crashes?

Thanks in advance

Upvotes: 2

Views: 124

Answers (2)

Greg
Greg

Reputation: 8340

Did you realize that 3563526520 is 3.3 GB? How much memory does your system have? The gigantic memory consumption stems from the fact that you have chosen the least optimal algorithm for reading the lines:

  1. You try to read all the lines to the memory before acting on them
  2. You chose to represent the text as list, which uses 8 bytes for each character read from the file (or 16 bytes on 64-bit systems)
  3. You don't use tail-recursion which means the compiler can't optimize your code to be more memory efficient

So, to fix the code:

  1. Read one line at at time, then parse and process it and store as Erlang terms rather than the raw input data
  2. Read lines as binaries, as suggested by Hynek -Pichi- Vychodil
  3. Make the function reading the file tail-recursive

Learn You Some Erlang has an excellent discussion about tail recursive functions if you want to know how to properly implement such functions.

If the function was written in a tail-recursive manner the whole algorithm could look like this:

get_all_lines(Device) ->
    get_all_lines(Device, []).

get_all_lines(Device, List) ->
    case io:get_line(Device, "") of
        eof ->
            lists:reverse(List);
        Line ->
            Data = process_line(Line),
            get_all_lines(Device, [Data | List])
    end.

Upvotes: 6

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

Try

{ok, Device} = file:open(FileName, [read, binary]),

and then rethink what you are really up to.

Upvotes: 2

Related Questions