LiJunjie
LiJunjie

Reputation: 90

Speeding-up reads for Linux application

My program reads a file, interleaving it as below:

enter image description here

The file to be read is large. It is split into four parts that are then split into many blocks. My program first reads block 1 of part 1, then jumps to block 1 of part 2, and so on. Then back to the block 2 of part 1, ..., as such.

The performance drops in tests. I believe the reason is that the page cache feature of kernel doesn't work efficiently in such situations. But the file is too large to mmap(), and the file is located in NFS.

How do I speed up reading in such a situation? Any comments and suggestions are welcome.

Upvotes: 0

Views: 132

Answers (3)

Vaughn Cato
Vaughn Cato

Reputation: 64308

You can break up the reading into linear chunks. For example, if your code looks like this:

int index = 0;
for (int block=0; block<n_blocks; ++block) {
  for (int part=0; part<n_parts; ++part) {
    seek(file,part*n_blocks+block);
    data[part] = readChar(file);
  }
  send(data);
}

change it to this:

for (int chunk=0; chunk<n_chunks; ++chunk) {
  for (int part=0; part<n_parts; ++part) {
    seek(file,part*n_blocks+chunk*n_blocks_per_chunk);
    for (int block=0; block<n_blocks_per_chunk; ++block) {
      data[block*n_parts+part] = readChar(file);
    }
  }
  send(data);
}

Then optimize n_blocks_per_chunk for your cache.

Upvotes: 0

Barton Chittenden
Barton Chittenden

Reputation: 4416

For each pair of blocks, read both in, process the first, and push the second on to a stack. When you come to the end of the file, start shifting values off the bottom of the stack, processing them one by one.

Upvotes: 0

Hasturkun
Hasturkun

Reputation: 36412

You may want to use posix_fadvise() to give the system hints on your usage, eg. use POSIX_FADV_RANDOM to disable readahead, and possibly use POSIX_FADV_WILLNEED to have the system try to read the next block into the page cache before you need it (if you can predict this). You could also try to use POSIX_FADV_DONTNEED once you are done reading a block to have the system free the underlying cache pages, although this might not be necessary

Upvotes: 1

Related Questions