Reputation: 15100
Consider there are N files to be completely written to disk (i.e. flushed from all file buffers). For each file we write a small (relative to HDD seek time) amount of data, e.g. 64KB, with WriteFile
, and then call FlushFileBuffers
on that file to ensure that the data for the file is completely flushed to hard drive.
If we write&flush files one-by-one sequentially, then I expect that approximately it takes time N*seekTime
+ N*writeTime
, where seekTime
is the time to position hard drive head to the proper sector (which may take up to the time of full disk rotation), and writeTime
is the time it takes the disk to write sequentially 64KB of data. With such one-by-one approach we give the OS no room for optimization because we define the sequence in which the files must be flushed.
With some support from the OS a performance improvement could be achieved by rearranging the order of file write&flushes so that taking into account the disk rotation (i.e. the current position of the head on disk) file operations are rearranged so to start from those for which almost no rotation is needed (i.e. the nearest to the current position of disk head) and ending with those for whom almost full rotation of the disk is needed.
The question is: does operating system (Windows in particular) provide such optimization? In the other words, can performance be improved by running file write&flush operations in parallel in N threads, one thread per file? Or will it cause extra re-positioning operations decreasing the performance (as a kind of context-switches for hard drive)?
Upvotes: 4
Views: 1585
Reputation: 1
You need to benchmark, since it is operating system, file system and hardware specific. On my Linux system, many file operations are going thru the page cache, so if two programs (or the same program run twice) are accessing a file near the same time, the latest access might not involve any physical disk I/O. Linux and POSIX have even some system calls to help the page cache (posix_fadvise(2), madvise(2), readahead(2)...)
I don't know Windows, but heard and believes in the rumor which says that it is less efficient than Linux on such caching.
Hardware limitations are often a very significant bottleneck. Replacing your disk with an SSD might be worth the cost.
AFAIK, old BSD & SunOS & Linux disk drivers did the optimization you are suggesting (reorganizing I/O operations to lower seek & rotation delays). Today, it does not really matter (the disk controller itself will map "logical" sectors to "physical" ones).
Upvotes: 2
Reputation: 1031
You should first ask yourself, and explain here, why you need to flush. What you want to achieve is not necessarily what actually happens.
If you actually want to optimize an application in such a way that a certain access pattern on a physical device results, then you make your solution very hardware dependent. What appears like an optimization on your test cases may achieve the opposite effect in another scenario. For example, what about file fragmentation? What about raid disks? What about network file systems? What about SSD drives? What about concurrent accesses to the same disk by other processes running on the same machine?
The key to making disk accesses fast is buffering. Don't defeat it if you don't absolutely need to defeat it.
Upvotes: 3
Reputation: 171246
I believe Windows does no IO scheduling whatsoever, in fact it even breaks large IOs into 256KB pieces. Linux has IO scheduling built-in.
That said some drivers and disks do some reordering. Usually, the IO/sec rate increases to a point at higher queue depth. Crystal Disk benchmark has a QD32 mode.
SSDs certainly do that which is easy to see from benchmarks with high queue depth. SSDs also have hardware parallelism. They get faster when you increase the queue depth for random reads.
What I found on my desktop disk on Windows is that sequential small write-through IOs happen at a much faster rate than the disk seek rate. Either the controller is write-caching or the disk geometry is really amenable to sequential writes even if non-cached.
Upvotes: 1