Reputation: 1361

Is there a difference in Go between a counter using atomic operations and one using a mutex?

I have seen some discussion lately about whether there is a difference between a counter implemented using atomic increment/load, and one using a mutex to synchronise increment/load.

Are the following counter implementations functionally equivalent?

type Counter interface {
    Inc()
    Load() int64
}

// Atomic Implementation

type AtomicCounter struct {
    counter int64
}

func (c *AtomicCounter) Inc() {
    atomic.AddInt64(&c.counter, 1)
}

func (c *AtomicCounter) Load() int64 {
    return atomic.LoadInt64(&c.counter)
}

// Mutex Implementation

type MutexCounter struct {
    counter int64
    lock    sync.Mutex
}

func (c *MutexCounter) Inc() {
    c.lock.Lock()
    defer c.lock.Unlock()

    c.counter++
}

func (c *MutexCounter) Load() int64 {
    c.lock.Lock()
    defer c.lock.Unlock()

    return c.counter
}

I have run a bunch of test cases (Playground Link) and haven't been able to see any different behaviour. Running the tests on my machine the numbers get printed out of order for all the PrintAll test functions.

Can someone confirm whether they are equivalent or if there are any edge cases where these are different? Is there a preference to use one technique over the other? The atomic documentation does say it should only be used in special cases.

Update: The original question that caused me to ask this was this one, however it is now on hold, and i feel this aspect deserves its own discussion. In the answers it seemed that using a mutex would guarantee correct results, whereas atomics might not, specifically if the program is running in multiple threads. My questions are:

~~Is it correct that they can produce different results?~~ (See update below. The answer is yes.).
What causes this behaviour?
What are the tradeoffs between the two approaches?

Another Update:

I've found some code where the two counters behave differently. When run on my machine this function will finish with MutexCounter, but not with AtomicCounter. Don't ask me why you would ever run this code:

func TestCounter(counter Counter) {
    end := make(chan interface{})

    for i := 0; i < 1000; i++ {
        go func() {
            r := rand.New(rand.NewSource(time.Now().UnixNano()))
            for j := 0; j < 10000; j++ {
                k := int64(r.Uint32())
                if k >= 0 {
                    counter.Inc()
                }
            }
        }()
    }

    go func() {
        prevValue := int64(0)
        for counter.Load() != 10000000 { // Sometimes this condition is never met with AtomicCounter.
            val := counter.Load()
            if val%1000000 == 0 && val != prevValue {
                prevValue = val
            }
        }

        end <- true

        fmt.Println("Count:", counter.Load())
    }()

    <-end
}

Upvotes: 37

Answers (4)

Kroksys

Reputation: 1001

Here are some benchmarks from my mac M1 - looks like writing atomics is 2x faster than mutex and around 25x faster when reading the value.

BenchmarkAtomicWrite-8      100000000           11.28 ns/op        0 B/op          0 allocs/op
BenchmarkMutexWrite-8       54016642            22.16 ns/op        0 B/op          0 allocs/op
BenchmarkAtomicRead-8       1000000000          0.8774 ns/op       0 B/op          0 allocs/op
BenchmarkMutexRead-8        54548967            22.08 ns/op        0 B/op          0 allocs/op

Feel free to run it on your device:

package main_test

import (
    "sync"
    "sync/atomic"
    "testing"
)

type atom struct {
    value atomic.Int64
}

type mute struct {
    value int64
    lock  sync.Mutex
}

var (
    a = atom{}
    m = mute{}
)

func BenchmarkAtomicWrite(b *testing.B) {
    for i := 0; i < b.N; i++ {
        a.value.Add(1)
    }
}

func BenchmarkMutexWrite(b *testing.B) {
    for i := 0; i < b.N; i++ {
        m.lock.Lock()
        m.value++
        m.lock.Unlock()
    }
}

func BenchmarkAtomicRead(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = a.value.Load()
    }
}

func BenchmarkMutexRead(b *testing.B) {
    for i := 0; i < b.N; i++ {
        m.lock.Lock()
        _ = m.value
        m.lock.Unlock()
    }
}

Upvotes: 6

Jonathan Hall

Reputation: 79704

There is no difference in behavior. There is a difference in performance.

Mutexes are slow, due to the setup and teardown, and due to the fact that they block other goroutines for the duration of the lock.

Atomic operations are fast because they use an atomic CPU instruction (when possible), rather than relying on external locks to.

Therefore, whenever it is feasible, atomic operations should be preferred.

Upvotes: 36

Hugh

Reputation: 1361

Alright, I'm going to attempt to self-answer for some closure. Edits are welcome.

There is some discussion about the atomic package here. But to quote the most telling comments:

The very short summary is that if you have to ask, you should probably avoid the package. Or, read the atomic operations chapter of the C++11 standard; if you understand how to use those operations safely in C++, then you are more than capable of using Go's sync/atomic package.

That said, sticking to atomic.AddInt32 and atomic.LoadInt32 is safe as long as you are just reporting statistical information, and not actually relying on the values carrying any meaning about the state of the different goroutines.

And:

What atomicity does not guarantee, is any ordering of observability of values. I mean, atomic.AddInt32() does only guarantee that what this operation stores at &cnt will be exactly *cnt + 1 (with the value of *cnt being what the CPU executing the active goroutine fetched from memory when the operation started); it does not provide any guarantee that if another goroutine will attempt to read this value at the same time it will fetch that same value *cnt + 1.

On the other hand, mutexes and channels guarantee strict ordering of accesses to values being shared/passed around (subject to the rules of Go memory model).

In regards to why the code sample in the question never finishes, this is due to fact that the func that is reading the counter is in a very tight loop. When using the atomic counter, there are no syncronisation events (e.g. mutex lock/unlock, syscalls) which means that the goroutine never yields control. The result of this is that this goroutine starves the thread it is running on, and prevents the scheduler from allocating time to any other goroutines allocated to that thread, this includes ones that increment the counter meaning the counter never reaches 10000000.

Upvotes: 20

kostix

Reputation: 55543

Atomics are faster in the common case: the compiler translates each call to a function from the sync/atomic package to a special set of machine instructions which basically operate on the CPU level — for instance, on x86 architectures, an atomic.AddInt64 would be translated to some plain ADD-class instruction prefixed with the LOCK instruction (see this for an example) — with the latter ensuring coherent view of the updated memory location across all the CPUs in the system.

A mutex is a much complicated thing as it, in the end, wraps some bit of the native OS-specific thread synchronization API (for instance, on Linux, that's futex).

On the other hand, the Go runtime is pretty much optimized when it comes to synchronization stuff (which is kinda expected — given one of the main selling points of Go), and the mutex implementation tries to avoid hitting the kernel to perform synchronization between goroutines, if possible, and carry it out completely in the Go runtime itself.

This might explain no noticeable difference in the timings in your benchmarks, provided the contention over the mutexes was reasonably low.

Still, I feel oblidged to note — just in case — that atomics and higher-level synchronization facilities are designed to solve different tasks. Say, you can't use atomics to protect some memory state during the execution of a whole function — and even a single statement, in the general case.

Upvotes: 11

Is there a difference in Go between a counter using atomic operations and one using a mutex?

Answers (4)

Related Questions