Iris Gaber
Iris Gaber

Reputation: 41

what's more efficient: reading from a file or allocating memory

I have a text file and I should allocate an array with as many entries as the number of lines in the file. What's more efficient: to read the file twice (first to find out the number of lines) and allocate the array once, or to read the file once, and use "realloc" after each line read? thank you in advance.

Upvotes: 2

Views: 512

Answers (3)

Surt
Surt

Reputation: 16129

I presume you want to store the read lines also and not just allocate an array of that many entries.

Also that you don't want to change the lines and then write them back as in that case you might be better off using mmap.

Reading a file twice is always bad, even if it is cached the 2nd time, too many system calls are needed. Also allocing every line separately if a waste of time if you don't need to dealloc them in a random order.

Instead read the entire file at once, into an allocated area.

Find the number of lines by finding line feeds.

Alloc an array

Put the start pointers into the array by finding the same line feeds again.
If you need it as strings, then replace the line feed with \0

This might also be improved upon on modern cpu-architectures, instead of reading the array twice it might be faster simply allocating a "large enough" array for the pointer and scan the array once. This will cause a realloc at the end to have the right size and potentially a couple of times to make the array larger if it wasn't large enough at start.

Why is this faster? because you have a lot of if's that can take a lot of time for each line. So its better to only have to do this once, the cost is the reallocation, but copying large arrays with memcpy can be a bit cheaper.

But you have to measure it, your system settings, buffer sizes etc. will influence things too.

Upvotes: 1

Andrew Henle
Andrew Henle

Reputation: 1

The answer to "What's more efficient/faster/better? ..." is always:

Try each one on the system you're going to use it on, measure your results accurately, and find out.

The term is "benchmarking".

Anything else is a guess.

Upvotes: 0

rici
rici

Reputation: 241941

Reading the file twice is a bad idea, regardless of efficiency. (It's also almost certainly less efficient.)

If your application insists on reading its input teice, that means its input must be rewindable, which excludes terminal input and pipes. That's a limitation so annoying that apps which really need to read their input more than once (like sort) generally have logic to make a temporary copy if the input is unseekable.

In this case, you are only trying to avoid the trivial overhead of a few extra malloc calls. That's not justification to limit the application's input options.

If that's not convincing enough, imagine what will happen if someone appends to the file between the first time you read it and the second time. If your implementation trusts the count it got on the first read, it will overrun the vector of line pointers on the second read, leading to Undefined Behaviour and a potential security vulnerability.

Upvotes: 1

Related Questions