Reputation: 5407
As I understand, if we load any file once for reading then it remain in RAM as per LRU algo till does not get swapped by other file.
In my C program, I am loading 124MB text file to reads it's content. Ideally once I execute it should be in RAM and next time when I execute the same program soon it should take it from RAM only.
But here time taken in both the cases are 15s only, without any regards to how much time I execute the same program.
And as cache size is very limited like 3MB, putting inside cache not possible.
What can be other alternative for this to fasten the program execution?
Update:
Code link :
http://webdocs.cs.ualberta.ca/~sajib/cmput606/project/code/svm_classify.c - This file contains main() class and performs the classification job
http://webdocs.cs.ualberta.ca/~sajib/cmput606/project/code/svm_common.c - This file contains fucntion which are used to read file and perform the classification
Upvotes: 1
Views: 165
Reputation: 7970
Loading a 120MB file should take less than 1 second on a decent SSD. 2-3 seconds for a HDD. I can assume you don't read the file in large chunks but read it in small increments using function from the standard library (e.g. fscanf
or use fstream
).
Try reading the file in large chunks (1-16MB) and do your processing on that buffer.
If there are a lot of I/O calls to read the file, you have a lot of overhead caused by switching back and forth from kernel to user mode and other processes asking for I/O.
Edit:
lots of calls to fscanf
and gets
. Try reading the whole file to a single buffer and work on that buffer. use the read
(not fread
) to read the file in one shot.
If the file is too big, split it to 1MB reads.
Edit2
In the function read_model
replace fscanf
with sscanf
to work on a buffer.
Read all the model in one shot to a large buffer of the file size. File size can be found using stat
. Instead of using fgets
iterate the buffer using strtok
. The latter can be used to replace new lines with NULL chars while iterating them.
If you don't know any of these functions, try googling man funcname
. E.g. man strktok
.
Upvotes: 2
Reputation: 385
If you read the file as a whole, the file will be in RAM if your OS caches it. If, between two runs, cache pressure makes your OS (the Linux kernel as an example) throw away the loaded file, your file will read it from disk again.
However, your program has no control over whether the file comes from cache or not. The OS gives your program the file, whether from the disk or from the file cache is outside your control.
Some more information can be found in this little article: Experiments and fun with the Linux disk cache
Upvotes: 1
Reputation: 26954
Once your file is read for the first time, under a normally configured OS it's very likely that the involved disk pages are effectively cached.
Assuming that this memory is not required for other processes, second reads will be way faster than the first one.
As a quick test, we generate a random file and calculate md5sum twice (example in Linux):
$ dd if=/dev/urandom of=/tmp/readtest count=124 bs=1M
$ echo 3 > /proc/sys/vm/drop_caches # needs to be run as root
$ time md5sum /tmp/readtest
f788abe8a8d120a87bb293e65e5d50ff /tmp/readtest
real 0m5.706s
user 0m0.332s
sys 0m0.072s
$ time md5sum /tmp/readtest
f788abe8a8d120a87bb293e65e5d50ff /tmp/readtest
real 0m0.295s
user 0m0.268s
sys 0m0.024s
Observe the huge difference after dropping the cached pages.
There are reasons why you may not appreciate this:
Upvotes: 3