Reputation: 1860
I have a file (10-20MB) containing data, where each line is a single piece of data.
I have a C program that reads the file from the filesystem, and then based on command line input, it reads each line of the file, does a calculation on each line to determine if that line should be returned, and then return a subset of the data.
Assume that the program does an fread and reads the entire file into memory at the beginning, and then parses it directly from memory.
Would the program execute faster if, instead of reading it from the filesystem, I compiled the data into the program directly, by creating an array such as the following?
char *dataArray[] = {"data1", "data2", "data3"....};
Since the OS needs to read the entire binary from the filesystem, my gut feeling is that the execution time of both techniques would be similar, since reading from the filesystem would be the high order bit. However, would anyone have more definitive ideas on this?
Upvotes: 0
Views: 108
Reputation: 27478
Defining everything as a program literal will certainly be faster.
You do not need the relatively slow "open" call for the data file and you don't need to move the data from the buffer to your storage.
This was a common optimization circa. 1970, and every programming/coding style book since then stongly recommends you do not do this. The actual performance increase is minimal and what you gain in performance you lose in maintainability and flexibility.
Should you want a quick maintainable optimisation for this type of problem then look at the "mmap" call which makes the buffer directly available to your program and minimises data movement.
Upvotes: 3
Reputation: 215221
I doubt the difference in execution time will be significant, but from a memory utilization standpoint, putting the data in the executable (and qualifying it const
appropriately) will make a big difference.
If you read 10-20 megs of data from a file into memory allocated (e.g. via malloc
) in your program, the data initially exists in two places in memory: the filesystem cache, and your program's private memory. The former copy can be discarded if memory is tight, but the latter occupies physical memory or swap permanently until it's freed.
If on the other hand the 10-20 megs of data are part of your program's image (in the executable file), the data will be demand-paged, and can be discarded whenever needed because the OS knows it can reload the pages if it needs them again.
Upvotes: 1