S22
S22

Reputation: 121

purpose of memory barriers in linux kernel

Robert Love says that "set_task_state (task, state) sets the given task to the given state. If applicable, it also provides a memory barrier to force ordering on other processors (This is only needed on SMP systems) Otherwise it is equivalent to task->state = state

My question is: How a memory barrier can force ordering on other processors?

What does robert love mean by this - Why is this required? What is this ordering he might be talking about? Is he talking of scheduling queues here?

If so, does every processor in SMP have a different scheduling queue? I am confused

Upvotes: 5

Views: 2341

Answers (2)

Paul Rubel
Paul Rubel

Reputation: 27222

Your CPU, to squeeze out extra performance, does Out of Order Execution, which can run operations in a different order than they are given in the code. An optimizing compiler can change the order of operations to make code faster. Compiler writers/kernel types have to take care not to change expectations (or at least conform to the spec so they can say your expectation isn't right)

Here's an example

1: CPU1: task->state = someModifiedStuff
2: CPU1: changed = 1;
3: CPU2: if (changed)
4: CPU2:  ...

If we didn't have a barrier for setting state we could reorder 1 and 2. Since neither references the other a single-threaded implementation wouldn't see any differences. However, in a SMP situation, is we reordered 1 and 2 line 3 could see changed but not the state change. For example, if CPU1 ran line 2 (but not 1) and then CPU2 ran lines 3 and 4, CPU2 would be running with the old state and if it then cleared changed, the change that CPU1 just made would get lost.

A barrier tells the system that at some point, between 1 and 2 it must make things consistent before moving on.

Do a search on 'memory barrier', you'll find some good posts: Memory Barriers Are Like Source Control Operations

Upvotes: 3

Sigi
Sigi

Reputation: 4926

Memory barriers are required because current CPUs perform a lot of out-of-order executions: they load many instructions at a time and perform in a non-deterministic order them, if there are not dependencies among them.

In order to avoid reordering due to compiler optimization the volatile keyword is sufficient (speaking of C++ here). So a synchronization primitive (e.g. lock) is implemented by both properly using volatile and some kind of assembler fence instruction (there are many of them, more or less strong: see section 7.5.5 in http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html)

you know what a lock is?

x = 0;

thread 1:                           thread 2:

a.lock();                           a.lock();
x++;                                x++;
a.unlock();                         a.unlock();

x will result being correctly 2. Now suppose that there is no guarantee in the order of execution of the instructions of these two threads. What if the executed instruction are (a and x are independent, so out-of-order execution would be allowed, if lock() wasn't properly implemented with memory barriers):

x = 0;

thread 1:                           thread 2:

x++;                                x++;
a.lock();                           a.lock();
a.unlock();                         a.unlock();

x can result being equal to 2 or to 1.

Upvotes: 2

Related Questions