Reputation: 4510
I have a hundred million files, my program reads all these files at every startup. I have been looking for ways to make this process faster. On the way, I've encountered something strange. My CPU has 4 physical cores, but reading this many files with even higher thread counts yields much better results. Which is interesting, given that opening threads more than the logical core count of the CPU should be somewhat pointless.
8 Threads: 29.858 s
16 Threads: 15.882 s
32 Threads: 9.989 s
64 Threads: 7.965 s
128 Threads: 8.275 s
256 Threads: 8.159 s
512 Threads: 8.098 s
1024 Threads: 8.253 s
4096 Threads: 8.744 s
16001 Threads: 10.033 s
Why this may occur ? Is it some disk bottleneck ?
read()
, open()
and close()
Upvotes: 3
Views: 189
Reputation: 37232
Why this may occur ?
If you open one file at "/a/b/c/d/e
" then read one block of data from the file; the OS may have to fetch directory info for "/a
", then fetch directory info for "/a/b
", then fetch directory info for "/a/b/c
", then... It might add up to a total of 6 blocks fetched from disk (5 blocks of directory info then one block of file data), and those blocks might be scattered all over the disk.
If you open a 100 million files and read one block of file data from each; then this might involve fetching 600 million things (500 million pieces of directory info, and 100 million pieces of file data).
What is the optimal order to do these 600 million things?
Often there's directory info caches and file data caches involved (and all requests that can be satisfied by data that's already cached should be done ASAP, before that data is evicted out of cache/s to make room for other data). Often the disk hardware also has rules (e.g. faster to access all blocks within the same "group of disk blocks" before switching to the next "group of disk blocks"). Sometimes there's parallelism in the disk hardware (e.g. two requests from the same zone can't be done in parallel, but 2 requests from different zones can be done in parallel).
The optimal order to do these 600 million things is something the OS can figure out.
More specifically; the optimal order to do these 600 million things is something the OS can figure out; if and only if the OS actually knows about all of them.
If you have (e.g.) 8 threads that send one request (e.g. to open a file) and then block (using no CPU time) until the pending request completes; then the OS will only know about a maximum of 8 requests at a time. In other words; the operating system's ability to optimize the order that file IO requests are performed is constrained by the number of pending requests, which is constrained by the number of threads you have.
Ideally; a single thread would be able to ask the OS "open all the files in this list of a hundred million files" so that the OS can fully optimize the order (with the least thread management overhead). Sadly, most operating systems don't support anything like this (e.g. POSIX asynchronous IO fails to support any kind of "asynchronous open").
Having a large number of threads (that are all blocked and not using any CPU time while they wait for their request/s to actually be done by file system and/or disk driver) is the only way to improve the operating system's ability to optimize the order of IO requests.
Upvotes: 4