Reputation: 12567

How to make the OS schedule disk accesses optimally?

Suppose that a process needs to access the file system in many (1000+) places, and the order is not important to the program logic. However, the order obviously matters for performance if the file system is stored on a (spinning) hard disk.

How can the application programmer communicate to the OS that it should schedule the accesses optimally? Launching 1000+ threads does not seem practical. Does database management software accomplish this, and if so, then how?

Additional details: I had a large (1TB+) mmapped file where I needed to read 1000+ chunks of about 1KB, each time in new, unpredictable places.

Upvotes: 2

Answers (3)

Basile Starynkevitch

Reputation: 1

The kernel will already reorder the read/write requests (e.g. to fit the spin of a mechanical disk), if they come from various processes or threads. BTW, most of the reads & writes would go to the kernel file system cache, not to the disk.

You might consider using posix_fadvise(2) & perhaps (in a separate thread) readahead(2). If -instead of read(2)-ing- you use mmap(2) to project some file portion to virtual memory, you might use also madvise(2)

Of course, the file system does not usually guarantee that a sequential portion of a file is physically sequentially located on the disk (and even the disk firmware might reorder sectors). See picture in Ext2 wikipage, also relevant for Ext4. Some file systems might be better in that respect, and you could tune their block size (at mkfs time).

I would not recommend having thousands of threads (only at most a few dozens).

At last, it might worth buying some SSD or some more RAM (for file cache). See http://linuxatemyram.com/

Actual performance would depend a lot on the particular system and hardware.

Perhaps using an indexed file library like GDBM or a database library Sqlite (or a real database like PostGreSQL) might be worthwhile! Perhaps have fewer files but bigger ones could help.

BTW, you are mmap-ing, and reading small chunk of 1K (smaller than page size of 4K). You could use madvise (if possible in advance), but you should try to read larger chunks, since every file access will bring at least a whole page.

You really should benchmark!

Upvotes: 1

Fabio A. Correa

Reputation: 2036

Files and their transactions are cached in various devices in your computer; RAM and the HD cache are the most usual places. The file system driver may also implement IO transaction queues, defragmentation, and error-correction logic that makes things complicated for the developer who wants to control every aspect of file access. This level of complexity is ultimately designed to provide integrity, security, performance, and coordination of file access across all processes of your system.

Optimization efforts should not interfere with the system's own caching and prediction algorithms, not just for IO but for all caches. Trying to second-guess your system is a waste of your time and your processors' time.

Most probably your IO operations and data will stay on caches and later be committed to your storage devices when your OS sees fit.

That said, there's always options like database suites, mmap, readahead mechanisms, and direct IO to your drive. You will need to invest time benchmarking any of your efforts. I advise against multiple IO threads because cache contention will make things even slower than one thread.

Upvotes: 2

xmojmr

Reputation: 8145

In the early days when parameters like Wikipedia: Hard disk drive performance characteristics → Seek time were very expensive and thus very important, database vendors payed attention to the on-disk data representation and layout as can be seen e.g. in Oracle8i: Designing and Tuning for Performance → Tuning I/O.

The important optimization parameters changed with appearance of Solid-state drives (SSD) where the seek time is 0 (or at least constant) as there is nothing to rotate. Some of the new parameters are addressed by Wikipedia: Solid-state drive (SSD) → optimized file systems.

But even those optimization parameters go away with the use of Wikipedia: In-memory databases. The list of vendors is pretty long, all big players on it.

So how to schedule your access optimally depends a lot on the use case (1000 concurrent hits is not sufficient problem description) and buying some RAM is one of the options and "how can the programmer communicate with the OS" will be one of the last (not first) questions

Upvotes: 2

How to make the OS schedule disk accesses optimally?

Answers (3)

Related Questions