PeterK
PeterK

Reputation: 6317

Implementing a simple thread pool

I'm currently in the need of a simple and efficient thread pool implementation. I have searched here and also on Google and found numerous interesting links, but nothing i've found so far seems to be suitable. Most of the implementations i have found on the web are either too complicated or lack some of the key features i need.

Also i don't want to use code that i do not understand, so i decided to code it myself (sometimes reinventing the wheel helps me push myself forward in terms of knowledge and experience). I of course understand the basic idea behind thread pool, but some implementation details are still somewhat unclear to me. This is probably because the kind of thread pool i need is a bit special. Let me describe it. I have a task that is done hundreds of thousands of times on a particular (large) buffer. I have measured that the performance is much better if I use threads for this task - the buffer is split into sub-buffers and each thread performs its task on the sub-buffer and returns the result. All the results from all threads are then added together, giving me the final solution.

However since this is done very often i'm losing precious time because of so many threads being created (because of the overhead that comes with thread creation). So i would like to have a pool of threads that would perform this task, instead of creating a new set of threads every time.

To be more clear, this is what i have so far:

What i would like to achieve is this:

As you can see, this is a bit of a special thread pool, since i need to wait for the threads to finish. Basically i want to get rid of the overhead of creating threads all the time, since the program goes through hundreds of thousands of iterations so it can create&destroy milions of threads over its lifetime. Good news is that i do not need any synchronization at all between threads, they all get their own data and storage place fot the results. However i must wait until all threads are finished and i have the final solution, because the next task depends on the results of the previous task.

My main problem is with the management of threads:

I will be grateful for any help. Also feel free to ask questions if i was not clear enough. Thanks!

Upvotes: 2

Views: 2739

Answers (4)

Tony The Lion
Tony The Lion

Reputation: 63190

how do i make my threads "sleep" and wake them up once new task is ready?

You use a mutex or semaphore, depending on your situation (in some cases you might need to use a condition variable or auto/manual resetevent), to make threads wait on each other or wake up when something happens.

how do i wait for all the threads to finish?

You have to use join() on each thread to wait for it to finish. So if you have a collection of threads, you might want to go through and call join on each thread that is still running.

As a seperate note: Thread pools do already exist and you can just use things like Boost.Threadpool instead of reinventing the wheel.

Upvotes: 0

Martin James
Martin James

Reputation: 24847

'As you can see, this is a bit of a special thread pool, since i need to wait for the threads to finish.' - not completely. You want the thread that processes the last task in your job to provide a job completion notification. Completion notification is a normal function of a theadPool, else the originating thread would be unable to process a complete set of results. Pools are often processing more than one task/task-hierarchy at the same time and so the completion notification method should be thread-agnostic - no join() or any of that kind of stuff. Also, no WaitForMultipleObject() - uses a synchro object array that is difficult to manage and is restricted to 64 objects.

Thread pools typically have a pool of threads waiting on a producer-consumer queue for tasks. The tasks are usually inherited from some 'Ctask' class that proides threadpool services. A mechanism for completion deection and notification is one of them.

A producer-consumer queue is essentially a 'normal' queue class protected from multiple access by a mutex and with a semaphore for counting the tasks in the queue and for the threads to wait on. The pool threads are each passed this queue and they loop forever, waiting on the queue semaphore, popping tasks from the locked queue then and callling the run() method of the tasks received.

Each task would have the thread pool loaded as a data member as it is submitted to the pool. This allows a task to submit more tasks if required.

Completion of each task is usually notified somewhere by calling an event method that is a member of the task, loaded by the originating thread before the task is submitted to the pool.

A task should also have a sub-task atomic count-down integer and an event to wait on for completion of other tasks.

How might this work in your example? You could have a 'primary' task that submits array-processing tasks and waits for them all to complete.

There should be more threads in teh pool than there are cores. I suggest twice as many.

The array needs to be split so that a separate task is used for each section. How many tasks - enough so that the available cores are all used up but not so many that excessive context-switches are generated. For any array of reasonable size, lets say that 64 tasks is a reasonable split - more than the typical number of processors available. Also, the tasks should not be split sequentially to avoid false-sharing.

So, this 'primary' array-processing task. Load it up with the array reference and set its completion event to point at some method that signals an event. Submit the task to the pool, wait on the event.

The tasks gets loaded onto a thread. Its run() method uses two loops and creates 32 array-processing tasks, each with its own start index and length into the array, but with non-consecutive start-indexes. The task uses its own inherited submit() method to load each of the 32 new tasks onto the pool. As well as actually queueing up the tasks for execution on the threads, this submit() also atomic-increments a completion count integer and sets the completion event of the tasks to a private completion event before queueing the task. The private completion event atomic-decrements the completion-count and signals an event if zero. After submitting all 32 array-processing events, the primary task waits on the private completion event.

So, the 32 array-processing tasks run on the threads. As each completes, the thread that is running it calls its completion event which decrements the completion count integer in the primary task. Eventually, the last array-processing task completes and the completion count integer is decremented to zero, so signaling the event which the thread running the primary task is waiting on. The primary task calls its own completin event, so signaling the event upon which the primary task originator is waiting.

The primary task originator runs on with the array completely processed.

..or, you could use a threadPool class that already works, as suggested by others.

Upvotes: 0

stefaanv
stefaanv

Reputation: 14392

For me the preferred way for communicating with threads is via condition variables. Because you can define the needed condition and signal when it changes. In your case, you can combine it with a queue with which the sub-buffers are passed, so each thread waits while the queue is empty. The result can then be put on another queue where the managing queue is waiting until all threads have posted the result to the queue (the reference to this queue is passed as a request together with the sub-buffers).

Upvotes: 2

X-Istence
X-Istence

Reputation: 16667

Have you looked at other threadpool implementations? Such as http://threadpool.sourceforge.net/ for example. What you want to accomplish is not exactly new. One way of making threads wait on a new task is to block on a mutex and unblock that mutex when another task is ready. You can also have the threads notify they are done using some sort of notification from the thread back to the parent.

In my line of work I have been using thread pools/threads heavily and have been using ØMQ for communication across threads, this allows the thread to block on a read() request from ØMQ when it is ready for new work.

With a little bit of research and with a little bit of time and effort you should be able to figure out how to either build or utilise existing frameworks/tools to build what you need. Then you can come back to SO when you have some code you are having issues with.

Upvotes: 1

Related Questions