omega
omega

Reputation: 43953

thread overhead performance

When programming in C using threads, in a Linux shell, I am trying to reduce the thread overhead, basically lower CPU time (and making it more efficient).

Now in the program lots of threads are being created and need to do a job before it terminates. Only one thread can do the job at the same time because of mutual exclusion.

I know how long a thread will take to complete a job before it starts

Other threads have to wait while there is a thread doing that job. The way they check if they can do the job is if a condition variable is met.

For waiting threads, if they wait using that condition variable, using this specific code to wait (the a, b, c, and d is just arbitrary stuff, this is just an example):

while (a == b || c != d){
    pthread_cond_wait(&open, &mylock);
}

How efficient is this? Whats happening in the pthread_cond_wait code? Is it a while loop (behind the scenes) that constantly checks the condition variable?

Also since I know how long a job a thread will take, is it more efficient that I enforce a scheduling policy about shortest jobs first? Or does that not matter since, in any combination of threads doing the job, the program will take the same amount of time to finish. In other words, does using shortest job first lower CPU overhead for other threads doing the waiting? Since the shortest job first seems to lower waiting times.

Upvotes: 2

Views: 1144

Answers (2)

autistic
autistic

Reputation: 15642

Solve your problem with a single thread, and then ask us for help identifying the best place for exposing parallelisation if you can't already see an avenue where the least locking is required. The optimal number of threads to use will depend upon the computer you use. It doesn't make much sense to use more than n+1 threads, where n is the number of processors/cores available to your program. To reduce thread creation overhead, it's a good idea to give each thread multiple jobs.

The following is in response to your clarification edit:

Now in the program lots of threads are being created and need to do a job before it terminates. Only one thread can do the job at the same time because of mutual exclusion.

No. At most n+1 threads should be created, as described above. What is it you mean by mutual exclusion? I consider mutual exclusion to be "Only one thread includes task x in it's work queue". This means that no other threads require locking on task x.

Other threads have to wait while there is a thread doing that job. The way they check if they can do the job is if a condition variable is met.

Give each thread an independent list of tasks to complete. If job x is a prerequisite to job y, then job x and job y would ideally be in the same list so that the thread doesn't have to deal with thread mutex objects on either job. Have you explored this avenue?

while (a == b || c != d){ pthread_cond_wait(&open, &mylock); } How efficient is this? Whats happening in the pthread_cond_wait code? Is it a while loop (behind the scenes) that constantly checks the condition variable?

In order to avoid undefined behaviour, mylock must be locked by the current thread before calling pthread_cond_wait, so I presume your code calls pthread_mutex_lock to acquire the mylock lock before this loop is entered.

  1. pthread_mutex_lock blocks the thread until it acquires the lock, which means that one thread at a time can execute the code between the pthread_mutex_lock and pthread_cond_wait (the pre-pthread_cond_wait code).
  2. pthread_cond_wait releases the lock, allowing some other thread to run the code between the pthread_mutex_lock and the pthread_cond_wait. Before pthread_cond_wait returns, it waits until it can acquire the lock again. This step is repeated adhoc while (a == b || c != d).
  3. pthread_mutex_unlock is later called when the task is complete. Until then, only one thread at a time can execute the code between the pthread_cond_wait and the pthread_mutex_unlock (the post-pthread_cond_wait code). In addition, if one thread is running pre-pthread_cond_wait code then no other thread can be running post-pthread_cond_wait code, and visa-versa.

Hence, you might as well be running single-threaded code that stores jobs in a priority queue. At least you wouldn't have the unnecessary and excessive context switches. As I said earlier, "Solve your problem with a single thread". You can't make meaningful statements about how much time an optimisation saves until you have something to measure it against.

Also since I know how long a job a thread will take, is it more efficient that I enforce a scheduling policy about shortest jobs first? Or does that not matter since, in any combination of threads doing the job, the program will take the same amount of time to finish. In other words, does using shortest job first lower CPU overhead for other threads doing the waiting? Since the shortest job first seems to lower waiting times.

If you're going to enforce a scheduling policy, then do it in a single-threaded project. If you believe that concurrency will help you solve your problem quickly, then expose your completed single-threaded project to concurrency and derive tests to verify your beliefs. I suggest exposing concurrency in ways that threads don't have to share work.

Upvotes: 4

evil otto
evil otto

Reputation: 10582

Pthread primitives are generally fairly efficient; things that block usually consume no or negligible CPU time while blocking. If you are having performance problems, look elsewhere first.

Don't worry about the scheduling policy. If your application is designed such that only one thread can run at a time, you are losing most of the benefits of being threaded in the first place while imposing all of the costs. (And if you're not imposing all the costs, like locking shared variables because only one thread is running at a time, you're asking for trouble down the road.)

Upvotes: 2

Related Questions