kingwah001
kingwah001

Reputation: 119

Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?

While reading the source codes of Go, I have a question about the code in src/sync/once.go:

func (o *Once) Do(f func()) {
    // Note: Here is an incorrect implementation of Do:
    //
    //  if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
    //      f()
    //  }
    //
    // Do guarantees that when it returns, f has finished.
    // This implementation would not implement that guarantee:
    // given two simultaneous calls, the winner of the cas would
    // call f, and the second would return immediately, without
    // waiting for the first's call to f to complete.
    // This is why the slow path falls back to a mutex, and why
    // the atomic.StoreUint32 must be delayed until after f returns.

    if atomic.LoadUint32(&o.done) == 0 {
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Why is atomic.StoreUint32 used, rather than, say o.done = 1? Are these not equivalent? What are the differences?

Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?

Upvotes: 10

Views: 1010

Answers (4)

Quân Anh Mai
Quân Anh Mai

Reputation: 630

Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?

Yes you are in the right direction of thought, but please note that even if the targeting machine has a strong memory model, the Go compiler can and will reorder instructions as long as the result adheres to the Go memory model. In contrast, even if the machine memory model is weaker than the language one, the compiler has to emit additional barriers so that the final code behaves compliantly with the language specification.

Let's consider the implementation of sync.Once without sync/atomic, with modifications for easier explaining:

func (o *Once) Do(f func()) {
    if o.done == 0 { // (1)
        o.m.Lock() // (2)
        defer o.m.Unlock() // (3)
        if o.done == 0 { // (4)
            f() // (5)
            o.done = 1 // (6)
        }
    }
}

If a goroutine observes that o.done != 0, it will return, as a result, the function must ensure that f() happens before any read can observe a 1 from o.done.

  • If the read is at (4), then it is protected by the mutex, which means that it will surely happen after the previous acquisition of the mutex which executes f and set o.done to 1.
  • If the read is at (1), we don't have the protection of the mutex, so we must construct a synchronise-with relationship between the write (6) at the writing goroutine to the read (1) at the current goroutine, after that, since (5) is sequenced before (6), a read with value 1 from (1) will surely happen after the execution of (5) according to the transitivity of happen-before relationship.

As a result, the write (6) must have release semantics, as well as the read (1) having acquire semantics. Since Go does not support acquire-read and release-store, we must resort to the stronger order, which is sequential consistency, provided by atomic.(Load/Store)Uint32.

Final note: since accesses to memory locations not larger than a machine word are guaranteed to be atomic, this usage of atomic here has nothing to do with atomicity and everything to do with synchronisation.

Upvotes: 1

Mr_Pink
Mr_Pink

Reputation: 109417

Remember, unless you are writing the assembly by hand, you are not programming to your machine's memory model, you are programming to Go's memory model. This means that even if primitive assignments are atomic with your architecture, Go requires the use of the atomic package to ensure correctness across all supported architectures.

Access to the done flag outside of the mutex only needs to be safe, not strictly ordered, so atomic operations can be used instead of always obtaining a lock with a mutex. This is an optimization to make the fast path as efficient as possible, allowing sync.Once to be used in hot paths.

The mutex used for doSlow is for mutual exclusion within that function alone, to ensure that only one caller ever makes it to f() before the done flag is set. The flag is written using atomic.StoreUint32, because it may happen concurrently with atomic.LoadUint32 outside of the critical section protected by the mutex.

Reading the done field concurrently with writes, even atomic writes, is a data race. Just because the field is read atomically, does not mean you can use normal assignment to write it, hence the flag is checked first with atomic.LoadUint32 and written with atomic.StoreUint32

The direct read of done within doSlow is safe, because it is protected from concurrent writes by the mutex. Reading the value concurrently with atomic.LoadUint32 is safe because both are read operations.

Upvotes: 5

frihed
frihed

Reputation: 1

Atomic operations can be used to synchronize the execution of different goroutines.

Without synchronization, even if a goroutine observes o.done == 1, there is no guarantee that it will observe the effect of f().

Upvotes: -1

ken
ken

Reputation: 319

func (o *Once) Do(f func()) {
    if atomic.LoadUint32(&o.done) == 0 {       # 1
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {                            # 2
        defer atomic.StoreUint32(&o.done, 1)    # 3
        f()
    }
}
  • #1 and #3 : #1 is read, #3 is write, it's not safe, need mutext to protect
  • #2 and #3 : in critical section, procted by mutex, safe.

Upvotes: 0

Related Questions