danny
danny

Reputation: 1299

thread scheduling and yielding

If I have 4 working threads and 1 I/O thread running on a quad-core, one of the threads will be overlapped with another. How do i make sure that it is the input thread that is always overlapped with another so that i can sched_yield() to give up its current time slice to the other thread. If it is two worker threads that are overlapped, a yield on the input thread will not have any effect, right? Will sched_yield bring another thread from a different core anyway?

#include <sched.h>
#include <pthread.h>
void test(void*) {
   while(1) {}
}
int main (void) {
   pthread_t t; 
   for(int i = 0;i < 4;i++)
       pthread_create(&t,0,(void*(*)(void*))test,0); //workers
   while (1) {
       sched_yield(); //input thread
   }
   return 0;
}

Edit The input thread needs to poll for incoming messages. The library i am using (MPI) is not interrupt driven and condition variables are useless in this context. What i want to do in the input thread is check for a condition once, and give up on its time slice. If there are enough cores to run all threads, the input thread will run on its own core. If not, it will run minimum number of checks i.e. once per time slice. I hope i am clear enough.

Upvotes: 0

Views: 1335

Answers (2)

bazza
bazza

Reputation: 8414

Hmmm, MPI_recv claims to be blocking unless you do something specific to change that. MPI's underlying comms infrastructure is elaborate, and I don't know if 'blocking' extends as far as waiting on a network socket with a call to select(). You're sort of stating that it doesn't, which I can well believe given MPI's complexity.

MPI's Internals

So if MPI_recv in blocking mode inevitably involves polling one needs to work out exactly what the library underneath is doing. Hopefully it's a sensible poll (ie, one involving a call to nanosleep()). You could look at the Open MPI source code for that (eek), or use this and GTKWave to see what its scheduling behaviour is like in a nice graphical way (I'm assuming you're on Linux).

If it is sleeping in the polling loop then the version of the Linux kernel matters. More modern kernels (possibly requiring the PREEMPT_RT patch set - I'm afraid I can't remember) do a proper timer driven de-scheduled sleep even for short periods, so taking no CPU time. Older implementation would just go into a busy loop for short sleeps, which is no good to you.

If it's not sleeping at all then it's going to be harder. You'd have to use MPI in a non-blocking mode and do the polling / sleeping yourself.

Thread Priorities

Once you've got either your or MPI's code polling with a sleep you can then rely on using thread priorities and the OS scheduler to sort things out. In general putting the I/O thread at a higher priority than the worker threads is a good idea. It prevents the process at the other end of the I/O from being blocked by your work threads pre-empting your I/O thread. For this reason sched_yield() isn't a good idea because the scheduler won't put your thread to sleep.

Thread Affinity

In general I wouldn't bother with that, at least not yet. You've 5 threads and 4 cores; one of those threads will always be disappointed. If you let the kernel sort things out as best it can then provided you've got control of the polling (as described above) you should be fine.

--EDIT--

I've gone and had another look at MPI and threads, and re-discovered why I didn't like it. MPI intercommunicates for processes, each of which has a 'rank'. Whilst MPI is/can be thread-safe, a thread in itself doesn't have its own rank. So MPI is not capable of intercommunicating between threads. That's a bit of a weakness in MPI in these days of multi-core devices.

However you could have 4 separate processes and no I/O thread. That's likely to be less than optimal in terms of how much data is copied, moved and stored (it'll be 4x the network traffic, 4x the memory used, etc). However, if you've a large enough compute-time:I/O-time ratio you might be able to stand that inefficiency for the sake of simple source code.

Upvotes: 1

Nathan
Nathan

Reputation: 1288

The Googleable phrase you are looking for is "CPU affinity".

See for example this SO question.

Ensuring that each of the worker threads is running on a different core will achieve your stated goal.

I think a number of commenters have posted some legitimate concerns about the design of your application, and you might want to consider extending those conversations just to make sure the design you have in your head will actually effectively accomplish the end goal you want to achieve.

Upvotes: 1

Related Questions