user123
user123

Reputation: 5407

Working with in memory file

As I understand, if we load any file once for reading then it remain in RAM as per LRU algo till does not get swapped by other file.

In my C program, I am loading 124MB text file to reads it's content. Ideally once I execute it should be in RAM and next time when I execute the same program soon it should take it from RAM only.

But here time taken in both the cases are 15s only, without any regards to how much time I execute the same program.

And as cache size is very limited like 3MB, putting inside cache not possible.

What can be other alternative for this to fasten the program execution?

Update:

Code link :

http://webdocs.cs.ualberta.ca/~sajib/cmput606/project/code/svm_classify.c - This file contains main() class and performs the classification job

http://webdocs.cs.ualberta.ca/~sajib/cmput606/project/code/svm_common.c - This file contains fucntion which are used to read file and perform the classification

Upvotes: 1

Views: 165

Answers (3)

egur
egur

Reputation: 7970

Loading a 120MB file should take less than 1 second on a decent SSD. 2-3 seconds for a HDD. I can assume you don't read the file in large chunks but read it in small increments using function from the standard library (e.g. fscanf or use fstream).

Try reading the file in large chunks (1-16MB) and do your processing on that buffer.

If there are a lot of I/O calls to read the file, you have a lot of overhead caused by switching back and forth from kernel to user mode and other processes asking for I/O.

Edit: lots of calls to fscanf and gets. Try reading the whole file to a single buffer and work on that buffer. use the read (not fread) to read the file in one shot.

If the file is too big, split it to 1MB reads.

Edit2

In the function read_model replace fscanf with sscanf to work on a buffer.
Read all the model in one shot to a large buffer of the file size. File size can be found using stat. Instead of using fgets iterate the buffer using strtok. The latter can be used to replace new lines with NULL chars while iterating them.

If you don't know any of these functions, try googling man funcname. E.g. man strktok.

Upvotes: 2

Yamakuzure
Yamakuzure

Reputation: 385

If you read the file as a whole, the file will be in RAM if your OS caches it. If, between two runs, cache pressure makes your OS (the Linux kernel as an example) throw away the loaded file, your file will read it from disk again.

However, your program has no control over whether the file comes from cache or not. The OS gives your program the file, whether from the disk or from the file cache is outside your control.

Some more information can be found in this little article: Experiments and fun with the Linux disk cache

Upvotes: 1

jjmontes
jjmontes

Reputation: 26954

Once your file is read for the first time, under a normally configured OS it's very likely that the involved disk pages are effectively cached.

Assuming that this memory is not required for other processes, second reads will be way faster than the first one.

As a quick test, we generate a random file and calculate md5sum twice (example in Linux):

$ dd if=/dev/urandom of=/tmp/readtest count=124 bs=1M

$ echo 3 > /proc/sys/vm/drop_caches  # needs to be run as root

$ time md5sum /tmp/readtest 
f788abe8a8d120a87bb293e65e5d50ff  /tmp/readtest

real    0m5.706s
user    0m0.332s
sys 0m0.072s

$ time md5sum /tmp/readtest 
f788abe8a8d120a87bb293e65e5d50ff  /tmp/readtest

real    0m0.295s
user    0m0.268s
sys 0m0.024s

Observe the huge difference after dropping the cached pages.

There are reasons why you may not appreciate this:

  • The file is actually already cached when you first read it (most likely)
  • Caching is disabled for this disk or partition, or unsupported for the filesystem/device (very unlikely).

Upvotes: 3

Related Questions