Gowtham Jayaram
Gowtham Jayaram

Reputation:

Optimize file open and read

I have a C++ application running on Windows that wakes up every 15 mins to open & read files present in a directory. The directory changes on every run.

For each run; this operation (open & read) takes around 18-23 mins on a dual-core machine with disk spindle speed of 6000 RPM. I have captured the memory page fault /sec and they are in the range of 8000 – 10000.

Is there a way to reduce the page faults and optimize file open & read operation?

Gowtham

Upvotes: 4

Views: 3465

Answers (4)

Reputation:

First; thanks for all the answers. It was very helpful and provided us with many avenues to explore.

We removed STL and used C (fopen & fread). This provided us a slight improvement with the Open & Read operation for the above mentioned data taking 16 - 17 mins.

We really nailed the problem by compressing these files. This reduced the size of each file form 50K to 8K. The time taken by the Open & Read Operation was reduced to 4 - 5 mins.

Thank you.

Upvotes: 1

Drakosha
Drakosha

Reputation: 12165

  1. Maybe you can use something like memoisation, i.e. if file did not change (you can save it's last update time) then you can use it from the last time, i.e keep something in memory instead.

  2. I think you don't need FS caching. I.e. it'll be better to open files in O_DIRECT mode (it's linux, but i'm sure Windows has something similar) and read every file in one I/O, i.e. create buffer in memory of the file size and read into it. This should reduce CPU and memory usage very much.

  3. Multi threading, suggested above, will also help, but not much. I suspect the bottle neck is the disk, which can perform limited amount of I/O operations per second (100 can be an estimate). That's why you need to reduce the amount of I/O operations, like using (1), (2) described above or something else.

Upvotes: 0

noel aye
noel aye

Reputation: 351

According to MS PSDK documentation, file caching may be used. And, IMHO, instead of STL, windows native CreatFile, ReadFile and CloseHandle with appropriate flags may get a better performance since you mentioned windows.

But, on the other hand, according to your post, it seems you only read. So, caching may not increase performance significantly. But, since CPU is fast and disk i/o are usually slow, you may still use these kind of intermediate buffers concept together with multithreading, meaning running parallel read threads.

Upvotes: 0

Pasi Savolainen
Pasi Savolainen

Reputation: 2500

Don't use STL if you can avoid it. It handles very difficult internationalization and translation/transformation issues which makes it slow.

Most often the fastest way to read a file is to memory-map it (also in windows, CreateFileMapping as starting point. If at all possible, use a single file with total size of 50'000*50K and directly index that file when writing/reading. You should also consider using a DB (even SQLite) if data is at all structured. This amount of data is so small that it should stay in memory at all times. You could also try using ramdisk to avoid going to disk at all (this will tax your error recovery in case of hardware/electricity failure).

Upvotes: 3

Related Questions