Łukasz Lew
Łukasz Lew

Reputation: 50328

How to speed up reading of a fixed set of small files on linux?

I have 100'000 1kb files. And a program that reads them - it is really slow. My best idea for improving performance is to put them on ramdisk. But this is a fragile solution, every restart need to setup the ramdisk again. (and file copying is slow as well)

My second best idea is to concatenate the files and work with that. But it is not trivial.

Is there a better solution?

Note: I need to avoid dependencies in the program, even Boost.

Upvotes: 2

Views: 1004

Answers (2)

cmcginty
cmcginty

Reputation: 117076

If your files are static, I agree just tar them up and then place that in a RAM disk. Probably be faster to read directly out of the TAR file, but you can test that.

edit:: instead of TAR, you could also try creating a squashfs volume.

If you don't want to do that, or still need more performance then:

  1. put your data on an SSD.
  2. start investigating some FS performance test, starting with EXT4, XFS, etc...

Upvotes: 1

sehe
sehe

Reputation: 393457

You can optimize by storing the files contiguous on disk.

On a disk with ample free room, the easiest way would be to read a tar archive instead.

Other than that, there is/used to be a debian package for 'readahead'.

You can use that tool to

  1. profile a normal run of your software
  2. edit the lsit of files accesssed (detected by readahead)

You can then call readahead with that file list (it will order the files in disk order so the throughput will be maximized and the seektimes minimized)

Unfortunately, it has been a while since I used these, so I hope you can google to the resepctive packages

This is what I seem to have found now:

sudo apt-get install readahead-fedora

Good luck

Upvotes: 2

Related Questions