Reputation: 4505
I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k @ 3TB) system.
Now I have some threads that wants to read portions of that file. Every thread has an array of chunks it needs. Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.
What is the fastest way to read this data. I don't care about CPU cycles, (disk) latency is what counts. So if possible I want take advantage of NCQ of the hard disks.
As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.
What is the best way to read the data? Do you have experiences, tips, hints?
Upvotes: 13
Views: 1140
Reputation: 65314
The optimum number of parallel requests depends highly on factors outside your app (e.g. Disk count=4, NCQ depth=?, driver queue depth=? ...), so you might want to use a system, that can adapt or be adapted. My recommendation is:
Why sync reads? They have lower latency than ascync reads. Why waste latency on a queue? A good lockless queue implementation starts at less than 10ns latency, much less than two thread switches
Update: Some Q/A
Should the read threads keep the files open? Yes, definitly so.
Would you use a FileStream with FileOptions.RandomAccess? Yes
You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? Yes, that's what I meant. The queue depth of read requests is managed by the thread count.
Upvotes: 4
Reputation: 782
Disks are "single threaded" because there is only one head. It won't go faster no matter how many threads you use... in fact more threads probably will just slow things down. Just get yourself the list and arrange (sort) it in the app.
You can of course use many threads that'd make use of NCQ probably more efficient, but arranging it in the app and using one thread should work better.
If the file is fragmented - use NCQ and a couple of threads because you then can't know exact position on disk so only NCQ can optimize reads. If it's contignous - use sorting.
You may also try direct I/O to bypass OS caching and read the whole file sequentially... it sometimes can be faster, especially if you have no other load on this array.
Upvotes: 0