Reputation: 2038
I know that modern CPUs have instruction pipelining, that the execution of every single machine instruction will be separated into several steps, for example, the RISC five-level pipelines. And my question is whether the assembly instruction inc rax
is atomic when it is executed by different threads? Is that possible that thread A is in the Instruction Execution (EX) stage, calculating the result by incrementing the current value in register rax by 1 while thread B is in the Instruction Decoding (ID) stage, reading from the register rax of the value that is not incremented by thread A yet. So in the case, there is a data race between threads A and B, is this correct?
Upvotes: 0
Views: 318
Reputation: 58162
TL;DR: For a multithreaded program on x86-64, inc rax
cannot cause or suffer any data race issues.
At the machine level, there are two senses of "atomic" that people usually use.
One is atomicity with respect to concurrent access by multiple cores. In this sense, the question doesn't really even make sense, because cores do not share registers; each has its own independent set. So a register-only instruction like inc rax
cannot affect, or be affected by, anything that another core may be doing. There is certainly no data race issue to worry about.
Atomicity concerns in this sense only arise when two or more cores are accessing a shared resource - primarily memory.
The other is atomicity on a single core with respect to interrupts - if a hardware interrupt or exception occurs while an instruction is executing on the same core, what happens, and what machine state is observed by the interrupt handler? Here we do have to think about registers, because the interrupt handler can observe the same registers that the main code was using.
The answer is that x86 has precise interrupts, where interrupts appear to occur "between instructions". When calling the interrupt handler, the CPU pushes CS:RIP onto the stack, and the architectural state of the machine (registers, memory, etc) is as if:
the instruction pointed to by CS:RIP, and all subsequent instructions, have not begun to execute at all; the architectural state reflects none of their effects.
all instructions previous to CS:RIP have completely finished, and the architectural state reflects all of their effects.
On an old-fashioned in-order scalar CPU, this is easily accomplished by having the CPU check for interrupts as a step in between the completion of one instruction and the execution of the next. On a pipelined CPU, it takes more work; if there are several instructions in flight, the CPU may wait for some of them to retire, and abort the others.
For more details, see When an interrupt occurs, what happens to instructions in the pipeline?
There are a few exceptions to this rule: e.g. the AVX-512 scatter/gather instructions may be partially completed when an interrupt occurs, so that some of the loads/stores have been done and others have not. But it sets the registers in such a way that when returning to execute the instruction again, only the remaining loads/stores will be done.
From the point of view of an application on a multitasking operating system, threads can run simultaneously on several cores, or run sequentially on a single core (or some combination). In the first case, there is no problem with inc rax
as the registers are not shared between cores. In the second case, each thread still has its own register set as part of its context. Your thread may be interrupted by a hardware interrupt at any time (e.g. timer tick), and the OS may then decide to schedule in a different thread. To do so, it saves your thread's context, including the register contents at the time of the interrupt - and since we have precise interrupts, these contents reflect instructions in an all-or-nothing fashion. So inc rax
is atomic for that purpose; when another thread gets control, the saved context of your thread has either all the effects of inc rax
or none of them. (And it usually doesn't even matter, because the only machine state affected by inc rax
is registers, and other threads don't normally try to observe the saved context of threads which are scheduled out, even if the OS provides a way to do that.)
Upvotes: 2