Andy k
Andy k

Reputation: 1116

Why can't Tstringlist.LoadFromfile load a reasonably sized file?

Compiling on a Win 10 system with 8GB memory for 32 bit.

Trying to load a 150Mb ASCII file using Tstringlist.Loadfromfile gives an "Out of Memory Error" with task manager reporting 1200Mb used.

Even with the 50% redundancy of Unicode can't explain that level of inefficiency!

Any idea whats going on?

Example code.

unit Unit3;

interface

uses
  Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
  Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls;

type
  TForm3 = class(TForm)
    OpenDialog1: TOpenDialog;
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form3: TForm3;

implementation

{$R *.dfm}

procedure TForm3.Button1Click(Sender: TObject);
var f:tstringlist;
begin
  if opendialog1.Execute then
  begin
    f:=Tstringlist.create;
    try
      f.Loadfromfile(Opendialog1.Filename);
    finally
      f.free
    end
  end;
end;

end.

As requested, some representitive text from the file below....

AcDbPolyline
 90
5
 70
0
 10
100091.01
 20
59019.75
 10
100077.39
 20
59001.49
 10
100070.7
 20
58974.72
 10
100066.85
 20
58942.73
 10
100065.12
 20
58920.69
  0
LWPOLYLINE
  5

Notepad++ reports the file as having 27 million lines.

Upvotes: 3

Views: 1817

Answers (1)

David Heffernan
David Heffernan

Reputation: 613451

Each string requires a separate heap allocation. Each heap allocation yields a block of memory, but also some meta data that is used by the memory manager. On top of that the string has its own meta data, reference count and length.

If your file contains a lot of very short lines then the meta data can easily dominate. In extremis a single character string could easily consume 20 bytes or more. Off the top of my head, I don't know the actual overhead figures, so this is a guesstimate.

With very short lines there will be a lot of strings. The array of pointers owned by the string list could itself be very large. And if allocated by resizing memory could result in fragmentation. You say there are 27 million lines. At two pointers per line, one the string and one for its associated object, that's over 200MB alone.

All in all a TStringList is the wrong type for your task. You may best off reading the file line by line with a string reader object. Or loading the entire file into memory and using a dedicated type to map lines to offsets, as a text editor does. In fact text editors typically use memory mapping to avoid address space crises. Without knowing what you want to do with the file, it's hard to make recommendations but I hope I've helped you understand why your current approach has no future.

Upvotes: 4

Related Questions