RushK
RushK

Reputation: 555

Performance when reading a file line by line vs reading the whole file

Is there a noticeable difference (in theory) when reading a while line by line compared to reading the whole file in one go?

Reading the whole file does have a negative impact on the amount of memory used but does it work faster?

I need to read a file and process each line. I don't know whether I should read one line at a time and process it, or read the whole file, process all, then write to output.

I've already setup the prgm to read line by line and I want to know whether it is worth the effort to change it to read the whole file (not easy given my setup).

Thanks,

Upvotes: 13

Views: 7539

Answers (6)

user137717
user137717

Reputation: 2165

I think it would depend on the needs of your application (like most things, I know). Reading a 1 MB file in Node js is ~3-4x faster with fs.readFile() than using a readable stream or line reader as far as just file reading goes. Streams may offer some additional performance if the file is very large and you are processing input on the fly. It may also be ideal if your application is already consuming a lot of memory as a Node process has a ~1.5 GB memory limit on 64 bit systems. Processing chunks as they come in may also be more performant if the source of the data is slow relative to how fast the cpu can process it (archives on HDD or tape, network connections like TCP). As far as reading a file into memory vs. streaming it into memory, I am guessing the function call overhead of emitting data events and switching to the processing function callback slow down the process.

Upvotes: 3

Miguel Grinberg
Miguel Grinberg

Reputation: 67507

Like others, I believe doing bigger reads will improve the performance of your application some, but don't expect miracles, I/O is already buffered at the OS layer, so you'll only be gaining by reducing the overhead of having too many read calls. Reading the whole file in one go is dangerous, unless you know the maximum possible size for your input files. A most reasonable approach is to read the file in large blocks.

If you wanted to improve even more, you should consider overlapping the I/O with the processing. Let's say you read the input file in blocks of 128MB. On your main thread you read the first 128MB block and then pass it on to a worker thread for processing. While the worker thread gets to work the main thread reads the second 128MB block. From that point on, while the worker thread is processing block N, the main thread is reading block N+1 from disk.

Upvotes: 2

Clare Macrae
Clare Macrae

Reputation: 3799

One factor is how much data you are going to be reading, and so how long the program initially takes to run, i.e. whether there's any benefit in working on the performance.

See the book quotes in this answer for some good, general advice on thinking about software performance.

(I know you're for an answer in theory, but this aspect of when to worry about performance is also important, whenever you have a finite amount of time to spend.)

Upvotes: 0

James Anderson
James Anderson

Reputation: 27478

Reading the whole file will be slightly faster -- but not much!

But be careful reading the whole file is not scalable as you are limited by the available memory in the system, once the file size exceeds the size of RAM avaibale to your program it will start using swap space will be much slower. If the file size exceeds the size of virtual memory available then your program will crash.

Upvotes: 3

srikanta
srikanta

Reputation: 2999

Reading the entire file into memory is generally not a good idea because the files can be huge and may take up a lot of memory and in worst case run out of memory. So, to balance performance and memory usage, you read a block of file into a buffer and parse through the buffer. When you are done processing the block, read the next block until EOF.

Deciding on a good block size will have to be done based on what you want to achieve.

Upvotes: 1

sys_debug
sys_debug

Reputation: 4003

To be honest, after studying the efficiency for a while during my degree, I came to conclude this about your question: it depends how often this file is going to be read. If you reading it once, then do the whole thing, because that would just free the process for other tasks. Again one more thing to keep in your mind, is the file going to be edited later and require update (as in read the updated part only?) if so you might need to set a marker to recgonise where to read from (and then again how often it is updated?). But yes if it is a one time job, go ahead and read it as a whole, as long as you do not require tokens to be created of certain literals in the file. hope this helps.

Upvotes: 2

Related Questions