kravemir
kravemir

Reputation: 10996

How expensive are atomic operations?

I'm diving into multi-threaded programming and thinking about lock-free reference counting using atomic operations.

It's obvious, that atomic operation could be slower than non-atomic operations at least on constant scale. My worries are about other CPU synchronizations to perform atomic operations.

I wonder whether (if, and how much) execution of atomic operation on core A affects performance of other cores which:

  1. have nothing related to core A
  2. are executing different threads of same process as core A
  3. are executing atomic operation
  4. are executing atomic operation and are executing different threads of same process as core A
  5. are executing any memory related operation, ie. load, store,...
  6. are executing any memory related operation in same memory region (cache line, page?) as core A

Upvotes: 11

Views: 5656

Answers (2)

David Schwartz
David Schwartz

Reputation: 182761

I'm comparing an atomic read-modify-write operation to the corresponding non-atomic operation on modern x86 CPUs.

have nothing related to core A

No effect.

are executing different threads of same process as core A

No effect.

are executing atomic operation

No effect.

are executing atomic operation and are executing different threads of same process as core A

No effect.

are executing any memory related operation, ie. load, store,...

No effect.

are executing any memory related operation in same memory region (cache line, page?) as core A

The cache line has to be exclusively acquired by the core performing the atomic operation (stealing it from any other core(s) that have it in their caches) and cannot be accessed by another core until the atomic operation is completed to cache and inter-cache traffic synchronizes it so that it's either shared or exclusive in the other core.

The main cost of atomic operations is to the pipelines of the core executing the atomic instruction. Because the atomic operation must take place all at once at a well-defined place, it (mostly) cannot overlap other operations. That's a huge penalty for a superscalar CPU that gains performance by keeping lots of instructions in various stages of processing.

Upvotes: 8

SergeyA
SergeyA

Reputation: 62573

Many people think that atomic operations are cheap. However, it is not neccessarily true, since atomic operation is a generalization. There are 3 basic types of atomic operations:

  1. Atomic save
  2. Atomic load
  3. Atomic CompareAndSet (increment/decrement/etc)

The first two are usually more or less cheap (or, as we all know, have exactly the same cost as their non-atomic friends on Intel). They do impose memory barriers, but the barriers are only relevant to the CPU which executes them and CPUs are working hard to make barriers efficient. However, the third one might be not as cheap under contention. Atomic CAS and friends actually does the operation in loop, until succeeds, so under contention it might take significant time to perform the operation.

Upvotes: 8

Related Questions