Reputation: 21
I am working on a project based on parallel programming, where I need to execute a given task as efficiently as possible (in terms of time and energy consumption). For this I need to suspend some threads from the worker pool based on some conditions. These threads are created using pthread_create().
I have two types of worker pools, one stores the threads that are active and other one stores the suspended threads. After identifying the thread to be suspended, I push it's threadID into my suspended thread pool and then suspend the thread using pthread_kill.
push_task_suspended(threadID);
int status = pthread_kill(threadID,SIGSTOP);
But, I am getting a segmentation fault on using this. I have run gdb on this code, it shows segmentation fault due to pthread_kill.
Can you please tell why am I getting this?
Upvotes: 0
Views: 512
Reputation: 383
I don't agree with previous posters that signals generally can't be used with threads. You just have to do it with caution.
The disadvantage of solutions using condition variables, mutexes, or semaphores is that the thread isn't paused immediately, and requires active checking on the part of the thread to be stopped. You can achieve an immediate stop using signals. You just can't use SIGSTOP
.
pthread_kill(threaID, SIGSTOP)
stops the entire process and thus the caller, too, which is probably not intended, and you can't do anything about that. The Linux pthread_kill(3) man page says "if a signal handler is installed, the handler will be invoked in the thread thread
, but if the disposition of the signal is "stop", "continue", or "terminate", this action will affect the whole process", and the signal(7) man page specifies that the disposition of SIGSTOP
can't be changed.
But you can use different signals instead, for example SIGUSR1
for "stop thread" and SIGUSR2
for "continue thread". You must catch these signals (to avoid using a default disposition), and make sure that the to-be-stopped thread doesn't block them. Of course, signals must be sent with pthread_kill()
.
The signal handler for the "stop thread" signal calls pause()
or some other function that waits indefinitely. A later "continue thread" signal (in fact, any caught signal that isn't blocked) will interrupt the pause()
call, the "stop thread" signal handler will return, and the thread will resume execution.
The general rule when using threads and signals is "block everything". Threads should unblock only signals they strictly need to react on. Standard signals like SIGTERM
, SIGINT
, SIGCHLD
, which might be sent to the application from outside, should be blocked in all but one dedicated thread. When a termination signal is received, this thread should take care of terminating the other threads, either using pthread_cancel()
or by sending an application-defined signal.
Upvotes: 0
Reputation: 1772
I don't know why the pthread_kill(threadID,SIGSTOP)
is crashing -- I guess threadID
is not the pthread_t
for the thread ? -- but it's definitely not a good way of dealing with the problem !
Condition variables are a bit tricky, but worth understanding. I got a bit over excited here... but I hope it's useful.
Using your own 'task_suspended' queue -- with sema_t
Let's assume you have a mutex around the dequeuing of pending tasks and the enqueuing of idle workers. Then a worker going idle must:
loop:
lock(mutex)
.... look for task, but if none pending ....
enqueue(self) -- on task_suspended queue
unlock(mutex) -- (a)
suspend(self) -- (b)
goto loop
And when adding a task, the logic is:
lock(mutex)
enqueue(task) -- on task pending queue
if (worker-idle-queue-not-empty)
dequeue(worker)
desuspend(worker)
unlock(mutex)
In fact, the desuspend()
does not need to be inside the mutex
, but that's a minor matter.
What does matter is that the desuspend()
must work even if it happens between the unlock()
at (a) and the suspend()
at (b). You could give every thread its own sem_t
semaphore -- then suspend()
is a sem_wait()
and desuspend()
is sem_post()
. [But, no, you cannot use a mutex for this !!]
Using a 'Condition Variable'
With your own 'task_suspended' queue you are reinventing a wheel.
As mentioned in the comments, above, the tool provided for this job is the (so called) 'condition variable' -- pthread_cond_t
.
The key to using 'condition variables' is to understand that they are absolutely not variables -- they do not have a value, they do not in any sense count the number of pthread_cond_wait()
and/or pthread_cond_signal()
... they are not a form of semaphore. Despite the name, a pthread_cond_t
is best thought of as a queue of threads. And then:
pthread_cond_wait(cond, mutex) is, effectively: enqueue(self) -- on 'cond'
unlock(mutex)
suspend(self)
....wait for signal...
lock(mutex)
where by some magic the enqueue()
+unlock()
+suspend()
are a single operation (as far as all threads are concerned), and then:
pthread_cond_signal(cond) is, effectively: if ('cond' queue-not-empty)
dequeue(thread) -- from 'cond'
desuspend(thread)
where, also, by some magic that is all a single operation. [NB: pthread_cond_signal()
is allowed to dequeue and desuspend more than one thread, see below.]
So now, for the worker thread we have:
lock(mutex)
loop:
.... look for task, but if none pending ....
pthread_cond_wait(cond, mutex)
goto loop
... if have task, pick it up ...
unlock(mutex)
and for task creation:
lock(mutex)
enqueue(task)
pthread_cond_signal(cond)
unlock(mutex)
where the cond
takes the place of the explicit queue of pending threads.
Now, the pthread_cond_signal(cond)
can be inside or outside the mutex
. If inside, then conceptually, as soon as a thread is dequeued from the cond
queue it will run and immediately block on the mutex -- which seems like a waste. But the implementation could do something clever, and simply transfer the restarted thread(s) from one queue to another.
Note that the task creator does not know how many suspended threads there are, nor does it care. POSIX says the pthread_cond_signal() function shall:
...unblock at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond).
...have no effect if there are no threads currently blocked on cond.
Note especially "unblock at least one of the threads". Again, it is a mistake to think of a cond
as a variable. It is a mistake to think of a cond
as (say) a "task ready" flag, or a count, or anything else you might think of as being a variable. It just isn't so. When a thread restarts after a pthread_cond_wait()
what it was waiting for may or may not have occurred, and if it has, another thread may have got there first. This is why everything you read about (so called) 'condition variables' will talk of using them inside a loop, and returning to the top of the loop (just after the lock(mutex)
) on return from the pthread_cond_wait()
.
NB: when a thread restarts after a pthread_cond_wait()
it may be one of several restarted by a single pthread_cond_signal()
, and yes it seems odd that POSIX allows that -- presumably either to fit with some historic implementation, or to allow for some simpler implementation (perhaps related to thread priority). But, even if pthread_cond_wait()
did guarantee to restart just one thread, the restarted thread could regain the mutex after some other worker thread, thus:
Worker 1 | Worker 2 | Task Queue
busy | busy | empty
lock(mutex) | . | .
+ task queue empty | . | lock(mutex)
unlock(mutex) + | . | -
wait(cond) | . | -
~ | lock(mutex) | + enqueue task
~ | - | + signal(cond)
re-lock(mutex) | - | unlock(mutex)
- | + dequeue task | .
- | unlock(mutex) | empty
+ task queue empty ! | busy | .
where +
is where the thread owns the mutex, -
is where it is waiting for the mutex, and ~
is where it is waiting for the 'cond' to be signaled.
You could be worried about doing pthread_cond_signal(cond)
every time a new task is enqueued... so you could do that only when the task queue was empty. You should be able to convince yourself that works -- particularly if done inside the mutex.
Using a sem_t
or a sem_t
with a counter
Alternatively, you could use a sem_t
to count the number of 'tasks - waiters'. Each time a new task is added to the queue, the semaphore is incremented (sem_post). Each time a worker completes a task, it reserves the next task or waits (sem_wait). You still need a safe way to enqueue and dequeue tasks -- say: lock(mutex), enqueue(task), unlock(mutex), post(sem); and: wait(sem), lock(mutex), dequeue(task), unlock(mutex).
The only difficulty here is that the maximum value of a semaphore can be as small as 32767 -- see sysconf(_SC_SEM_VALUE_MAX)
.
Or you could use one sem_t
and a count of waiters. So, for the worker thread we have:
loop:
lock(mutex)
.... look for task, but if none pending ....
increment waiter-count
unlock(mutex)
sem_wait(sem)
goto loop
and for task creation:
lock(mutex)
enqueue(task)
kick = (waiter-count != 0)
if (kick)
decrement(waiter-count)
unlock(mutex)
if (kick)
sem_post(sem)
The sem_post()
can be put inside the mutex -- but is better outside.
And you are OK unless you have more than 32767 worker threads (!).
But, when you un-pick this, you will see that this is (largely) reinventing pthread_cond_wait
/ _signal()
, and not likely to be any more efficient.
Upvotes: 2