Thread pool where the tasks are relatively large

Question

I have created a thread pool in C. Each thread in the pool performs exactly the same function. Several producer threads put fresh data into this pool's queue using a fairly standard mutex/cond method.

The fresh data is always relatively large, and the amount of processing that has to be performed can take quite awhile. Whenever I have seen something like this implemented the workers lock the queue while they either copy the required data or perform their requested task. In my case either of these operations may take awhile, and during that time other threads would be blocked from accessing the queue.

How should I go about announcing that a particular thread has taken a task but that the data associated with that task is still being used? Is it practical to add some sort of 'in process' flag to the queue and let the worker thread unlock the queue while it works?

Jonathan Leffler · Accepted Answer

Revise the queuing structure so that what's queued is just a pointer to the data to be processed.

When a thread needs to grab a job, it grabs the mutex, gets the next pointer to the task to be performed, possibly zaps the in-queue copy of the pointer or otherwise makes sure that other threads won't process what it is processing, and then releases the mutex (or signals a condition or whatever). The access to the big chunk of data is via the pointer; only the one thread has the pointer; it cleans up when it is done, but it operates secure in the knowledge that no other thread is working on the same data.

So, you finesse the problem of big chunks of data by not using them in the queue — you use little chunks of data, aka a pointer to the big chunk.

Currently my queue is a queue of pointers. However, I malloc a finite number of buffers at the beginning of the program. Correct me if I am wrong, but what you are suggesting is that I instead have the producer threads malloc memory as required, and have workers free it when they finish?

I'm not suggesting that you have producers use malloc() and consumers use free().

What I'm suggesting is that you organize the queue so that there is no question of either a producer or a consumer needing to lock it for any extended period of time. However, if your queue is already a queue of pointers, I don't understand how you've got problems in the first place.

It probably comes down to terminology — that great causer of confusion.

What I'm trying to suggest is that there is a queue of 'tasks waiting to be consumed'. Sometimes, that queue will be empty; then the consumer threads will wait on a condition 'queue not empty', and when a producer adds a task to the queue, one of the waiting consumers will be woken and will take over running the new task. But the consumer's initial step will simply be to remove the task from the 'waiting to be consumed' queue.

The information on the queue must be sufficient to identify what needs doing — which may be simply a pointer to a 'task description', which contains further pointers to where the data to be operated on is stored. Removing the 'task description' from the queue should be (must be) a quick, simple operation (protected by mutexes and conditions). A given task description is not accessed by multiple threads concurrently The data that it points at is not accessed by other threads (in general). If there is data shared between threads, then concurrent access to that shared data must be coordinated as usual.

But the key design point is that a consumer thread spends minimal time blocking either other consumer threads or the producer threads while manipulating the queue. It gains access to the queue, removes the first item on the queue, and releases access to the queue. It then gets on with processing the task that needs to be done — with no interference from producers or other consumers.

Likewise, a producer thread prepares the task description and ensures that the relevant (pre-allocated) buffer is used, etc — handling the acquisition of the buffer carefully, etc. But when the task description is ready, the producer spends a very short time acquiring access to the queue, adding the task description to the queue, releasing access to the queue, signalling the 'queue not empty' condition.

Thread pool where the tasks are relatively large

Answers (1)

Related Questions