IceCold
IceCold

Reputation: 21231

How to quickly display a large (GB) text file?

I want to quickly show the content of a large text file in my app without loading the whole file in memory.

How others are doing it?

  1. Total Commander is an wonderful tool that has an amazing internal viewer that does it. It opens ANY files no matter how big, instantaneously (or that fast that I can't time it). I tried it on 12GB file. There is no significant memory usage (only ~100KB) when it shows the file. How they do it?

  2. SynEdit - the program freezes (minutes) as it will first parse the entire file THEN it will show the text.

  3. LargeTextFile
    Approximates the size of the scroll bar. The scroll bar is adjusted continuously (it shrinks) until the program finally reads the entire file (could take minutes). Compared with Total Commander it really sucks.

  4. UltraEdit 32 - the program freezes (I had to kill it as I didn't had patience (or ram) to let is finish)

Upvotes: 4

Views: 2772

Answers (2)

dummzeuch
dummzeuch

Reputation: 11252

Written in Delphi, source code available:

This is a very simple tool for displaying large text files where large means they don't fit into the 2 GB memory which a 32 bit Windows process can use. I successfully tested it displaying a 48 GB XML dump of the English language Wikipedia which contained 789.577.286 lines of text.

https://sourceforge.net/projects/dzlargetextview/

(Yes, I know, this is an old question, but an example might still be helpful.)

Upvotes: 3

Arnaud Bouchez
Arnaud Bouchez

Reputation: 43053

You just read the file in blocks (e.g. by chunks of 64KB or 128KB), then you compute lines within those blocks. Don't try to work with lines for the whole document (as Silvester proposes), but with blocks and offsets, then trick the UI to emulate the fact that you don't know the lines.

The scrollbar won't follow the lines, but the offset in file, then within the blocks. If you move the bar, you will guess the closest line begin and end in the chunk.

The drawback of it is that it is easier to have a maximum line length, which is the chunk. TotalCommander will wrap very long lines, I suppose due to its internal chunking algorithm.

Upvotes: 10

Related Questions