Reputation: 8486
I'm writing a program which needs to recursively search through a folder structure, and would like to do so in parallel with several threads.
I've written the rather trivial synchronous method already - adding the root directory to the queue initially, then dequeuing a directory, queuing its subdirectories, etc., until the queue is empty. I'll use a ConcurrentQueue<T>
for my queue, but have already realized that my loops will stop prematurely. The first thread will dequeue the root directory, and immediately every other thread could see that the queue is empty and exit, leaving the first thread as the only one running. I would like each thread to loop until the queue is empty, then wait until another thread queues some more directories, and keep going. I need some sort of checkpoint in my loop so that none of the threads will exit until every thread has reached the end of the loop, but I'm not sure the best way to do this without deadlocking when there really are no more directories to process.
Upvotes: 3
Views: 3076
Reputation: 4592
If you want to stick to the concept of an explicit queue have a look on the BlockingCollection class. The method GetConsumingEnumerable() returns a IEnumerable which blocks, when the collection has run out of items and continues as soon new items are available. This means whenever the collection is empty the thread is blocked and thus prevents a premature stop of it.
However: Basically this is very useful for producer-consumer scenarios. I am not sure if your problem falls into this category.
Upvotes: 2
Reputation: 109005
Use the Task Parallel Library.
Create a Task
to process the first folder. In this create a Task
to process each subfolder (recursively) and a task for each relevant file. Then wait on all the tasks for this folder.
The TPL runtime will make use of the thread pool avoiding creating threads, which is an expensive operation. for small pieces of work.
Note:
1 As I understand it, in the TPL when waiting on tasks—using a TPL method—TPL will reuse that thread for other tasks until the wait is fulfilled.
Upvotes: 5
Reputation: 16757
It would seem like in this case that your best bet would be to create one thread to start, then whenever you load sub-directories, you should task threads from the thread pool to handle them. Allow your threads to exit when they are done and call new ones from the pool every time you go one step further into the directories. This way there is no deadlock and your system uses threads as it needs them. You could even specify how many threads to start based upon how many folders were found.
Edit: Changed the above to be more clear that you don't want to explicitly create new threads but instead you want to take advantage of the thread pool to add and remove threads as needed without the overhead.
Upvotes: 1