Reputation: 90
My program reads a file, interleaving it as below:
The file to be read is large. It is split into four parts that are then split into many blocks. My program first reads block 1 of part 1, then jumps to block 1 of part 2, and so on. Then back to the block 2 of part 1, ..., as such.
The performance drops in tests. I believe the reason is that the page cache feature of kernel doesn't work efficiently in such situations. But the file is too large to mmap()
, and the file is located in NFS.
How do I speed up reading in such a situation? Any comments and suggestions are welcome.
Upvotes: 0
Views: 132
Reputation: 64308
You can break up the reading into linear chunks. For example, if your code looks like this:
int index = 0;
for (int block=0; block<n_blocks; ++block) {
for (int part=0; part<n_parts; ++part) {
seek(file,part*n_blocks+block);
data[part] = readChar(file);
}
send(data);
}
change it to this:
for (int chunk=0; chunk<n_chunks; ++chunk) {
for (int part=0; part<n_parts; ++part) {
seek(file,part*n_blocks+chunk*n_blocks_per_chunk);
for (int block=0; block<n_blocks_per_chunk; ++block) {
data[block*n_parts+part] = readChar(file);
}
}
send(data);
}
Then optimize n_blocks_per_chunk for your cache.
Upvotes: 0
Reputation: 4416
For each pair of blocks, read both in, process the first, and push the second on to a stack. When you come to the end of the file, start shifting values off the bottom of the stack, processing them one by one.
Upvotes: 0
Reputation: 36412
You may want to use posix_fadvise()
to give the system hints on your usage, eg. use POSIX_FADV_RANDOM
to disable readahead, and possibly use POSIX_FADV_WILLNEED
to have the system try to read the next block into the page cache before you need it (if you can predict this).
You could also try to use POSIX_FADV_DONTNEED
once you are done reading a block to have the system free the underlying cache pages, although this might not be necessary
Upvotes: 1