Reputation: 2779
I have a function that checks if files in directory contains some string
boolean processFiles(File file, String phrase) {
if (file.isFile()) {
return fileContains(file, phrase);
}
try (DirectoryStream<Path> stream = Files.newDirectoryStream(file.toPath())) {
for (Path entry: stream) {
if (processFiles(entry.toFile(), phrase) {
return true;
}
}
}
return false;
}
How can I use concurrency in order to improve the performance if I have many directories, sub-directories and files?
I tried to create a thread for processing each sub-directory, but I run out of threads in case there are many nested sub-directories
Using a fixed-size thread pool is also problematic in case we have a many sub-directories What is the best way for using threads here in order to improve performance?
Upvotes: 1
Views: 133
Reputation: 140427
Using a fixed-size thread pool is also problematic in case we have a many sub-directories
That is an assumption, and it is simply: wrong.
You assume that the limiting factor is the number of threads. But what makes you think so? It is more likely that other elements of this operation will limit the overall performance, such as operating respectively file system activity. To be precise: the drive system below the file system.
You see, you can't make arbitrary problems go faster just by throwing an (unlimited) number of threads at them.
If you are serious about performance, stop making assumptions. Instead, start measuring. Test how much time 1 thread needs to "process" a larger tree. Do that repeatedly (most likely file system caching will play a big role here). Then see what changes if you use a fixed thread pool.
My assumption is: you will see a certain speedup, but rather quickly, adding more threads will slow down things again. Guessing here: a pool with 4, max 8 threads might give you "optimal" results.
Implementation wise, you could put "new" subdirectories that require crawling on a queue, and then your worker threads take them from the queue for processing.
Upvotes: 3