Reputation: 63190

Multi-Threading on multi core architecture

When you have a situation where Thread A reads some global variable and Thread B writes to the same variable, now unless read/write is not atomic on a single core, you can do it without synchronizing, however what happens when running on a multi-core machine?

Upvotes: 4

Answers (7)

Olof Forshell

Reputation: 3264

No one has mentioned the pros and cons of implicit synchronization.

The main "pro" is of course that the programmer can write anything at all and not have to bother about synchronization.

The main "con" is that this takes A LOT of time. The implicit synchronization needs to wind its way down through the caches to at least (you might think) the first cache that is common to both cores. Wrong! There may be several physical processors installed in the computer so synchronization can't stop at a cache, it needs to go all the way down to RAM. If you want to synchronize there you also need to synchronize with other devices that need to synchronize with memory i e any bus-mastering device. Bus-mastering devices may be cards on the classic PCI-bus and may be running at 33 MHz so the implicit synchronization would need to wait for them too to acknowledge that it's ok to write to or read from a specific RAM location. We're talking a 100X difference just in clock speed between the core and the slowest bus and the slowest bus needs several of its own bus cycles to react in a reliable manner. Because synchronization MUST be reliable, it is of no use otherwise.

So in the choice between implementing electronics for implicit synchronization (which is better left to the programmer to handle explicitly anyway) and a faster system which can synchronize when necessary the answer is obvious.

The explicit keys to synchronization are the LOCK prefix and the XCHG mem,reg instruction.

You could say that implicit synchronization is like training wheels: you won't fall to the ground but you can't go especially fast or turn especially quickly. Soon you'll tire and want to move on to the real stuff. Sure, you'll get hurt but in the process you'll either learn or quit.

Upvotes: 0

avakar

Reputation: 32635

As far as the (new) C++ standard is concerned, if a program contains a data race, the behavior of the program is undefined. A program has a data race if there is an interleaving of threads such that it contains two neighboring conflicting memory accesses from different threads (which is just a very formal way of saying "a program has a data race if two conflicting accesses can occur concurrently").

Note that it doesn't matter how many cores you're running on, the behavior of your program is undefined (notably the optimizer can reorder instructions as it sees fit).

Upvotes: 0

Stack Overflow is garbage

Reputation: 247899

Even on a singlecore machine, there is absolutely no guarantee that this will work without explicit synchronization.

There are several reasons for this:

the OS may interrupt a thread at any time (between any two instructions), and then run the other thread, and
if there is no explicit synchronization, the compiler may reorder instructions very liberally, breaking any guarantees you thought you had, and
even the CPU may do the same, reordering instructions on the fly.

If you want correct communication between two threads, you need some kind of synchronization. Always, with no exception.

That synchronization may be a mutex provided by the OS or the threading API, or it may be CPU-specific atomic instructions, or just a plain memory barrier.

Upvotes: 5

paxdiablo

Reputation: 881113

Even on a single core, you cannot assume that an operation will be atomic. That may be the case where you're coding in assembler but, if you are coding in C++ as per your question, you do not know what it will compile down to.

You should rely on the synchronisation primitives at the level of abstraction that you're coding to. In your case, that's the threading calls for C++. whether they be pthreads, Windows threads or something else entirely.

It's the same reasoning that I gave in another answer to do with whether i++ was thread-safe. The bottom line is, you don't know since you're not coding to that level (if you're doing inline assembler and/or you understand and can control what's going on under the covers, you're no longer coding at the C++ level and you can ignore my advice).

The operating system and/or OS-type libraries know a great deal about the environment they're running in, far more so than the C++ compiler would. Use of proper syncronisation primitives will save you a great deal of angst.

Upvotes: 9

Paul Rubel

Reputation: 27212

Depending on your situation the following may be relevant. While it won't make your program run incorrectly it can make a big difference in speed. Even if you aren't accessing the same memory location, you may get a performance hit due to cache effects if two cores are thrashing over the same page in the cache (though not the same location because you carefully synchronized your data structures).

There is a good overview of "false sharing" here: http://www.drdobbs.com/go-parallel/article/showArticle.jhtml;jsessionid=LIHTU4QIPKADTQE1GHRSKH4ATMY32JVN?articleID=217500206

Upvotes: 0

Amardeep AC9MF

Reputation: 19024

It will have the same pitfalls as with a single core but with additional latency due to the L1 cache synchronization that must take place between cores.

Note - "you can do it without synchronizing" is not always a true statement.

Upvotes: 7

John Gietzen

Reputation: 49534

For a non-atomic operation on a multi-core machine, you need to use a system provided Mutex in order to synchronize the accesses.

For C++, the boost mutex library provides several mutex types that provide a consistent interface for OS-supplied mutex types.

If you choose to look at boost as your syncing / multithreading library, you should read up on the Synchronization concepts.

Upvotes: 1

Multi-Threading on multi core architecture

Answers (7)

Related Questions