code_fodder
code_fodder

Reputation: 16341

Is it more efficient to mutex lock a variable multiple times in a code-block, or just lock the whole code-block?

If we have the following code example:

if (letsDoThis) //bool
{
    sharedVar++; // This is shared across other threads.
          :
    sharedVar++;
          :
    sharedVar++;
          :
    sharedVar++;
          :
    sharedVar++;
}

Where each : is like ~10 lines of code (no slow function calls or anything). Would it be faster to write code such that you lock a mutex around the whole "if" block (well, the contents of the if block), or to lock and unlock over each individual sharedVar usage?

If it is a "it depends" type question, then which approach (maybe as a rule of thumb) is better to start with?

Finally how can you determine which of the two runs faster on your system? - would a trace tool really show you useful data in a meaningful way?

Upvotes: 3

Views: 986

Answers (2)

MikeMB
MikeMB

Reputation: 21156

This depends (among all the other tings that impact performance) on

  • How many threads and cores you have (the min of both is relevant)
  • How much time threads spend in that part of the code (and other parts that lock the mutex)

From the perspective of a single thread, multiple locks and unlocks mean additional work and are - especially under congestion - rather expensive/timeconsuming and it will probably lead to more cache ping pong between the cores. So it will slow down the performance of a single thread. However, if multiple locks and unlocks reduce the total amount of time a thread holds the mutex in each iteration, then this means, there can be more parallelism in your program and the overal performance scales better with the number of threads and CPU cores.

Both effects might be negligable, both effects might be significant and if the code isn't in the hotpath of your program, it might not matter in the first place. I think the only thing, you can do to determine what is better in your case is to just run both variants and measure overall throughput. If you don't see a difference, I'd probably go with a single lock and unlock (i.e. with a std::lock_guard) for simplicity.

The general question you should ask however is, if you really need so much synchronization between the threads and if you have to synchronize multiple times: If it is ok that other threads don't see intermediate values of the shared state (which you can't guarantee anyway if they don't wait for each other), then why don't you combine all the operations on the shared state and make it a single operation at the end of the block?

And of course, if your shared state is indeed only a single integer, then you should just use atomics and get rid of the mutex altogether.

Upvotes: 3

bobah
bobah

Reputation: 18864

If the code between subsequent increments is allowed run concurrently from the program logic point of view then your best course is to use the atomic counter. If whatever is done before a counter increment needs to be immediately visible to other threads then increment with the release semantics (atomic increment plus barrier), else - with relaxed(just the atomic increment).

If for whatever reason atomic increment is not an option then benchmark your mutex ideas in a simple test application running the subject code concurrently in a loop. google benchmark is a nice little library that can save you some typing. If you only have raw Posix threads you can borrow my old code.

Upvotes: 1

Related Questions