Reputation: 1361
I have seen some discussion lately about whether there is a difference between a counter implemented using atomic increment/load, and one using a mutex to synchronise increment/load.
Are the following counter implementations functionally equivalent?
type Counter interface {
Inc()
Load() int64
}
// Atomic Implementation
type AtomicCounter struct {
counter int64
}
func (c *AtomicCounter) Inc() {
atomic.AddInt64(&c.counter, 1)
}
func (c *AtomicCounter) Load() int64 {
return atomic.LoadInt64(&c.counter)
}
// Mutex Implementation
type MutexCounter struct {
counter int64
lock sync.Mutex
}
func (c *MutexCounter) Inc() {
c.lock.Lock()
defer c.lock.Unlock()
c.counter++
}
func (c *MutexCounter) Load() int64 {
c.lock.Lock()
defer c.lock.Unlock()
return c.counter
}
I have run a bunch of test cases (Playground Link) and haven't been able to see any different behaviour. Running the tests on my machine the numbers get printed out of order for all the PrintAll
test functions.
Can someone confirm whether they are equivalent or if there are any edge cases where these are different? Is there a preference to use one technique over the other? The atomic documentation does say it should only be used in special cases.
Update: The original question that caused me to ask this was this one, however it is now on hold, and i feel this aspect deserves its own discussion. In the answers it seemed that using a mutex would guarantee correct results, whereas atomics might not, specifically if the program is running in multiple threads. My questions are:
Another Update:
I've found some code where the two counters behave differently. When run on my machine this function will finish with MutexCounter
, but not with AtomicCounter
. Don't ask me why you would ever run this code:
func TestCounter(counter Counter) {
end := make(chan interface{})
for i := 0; i < 1000; i++ {
go func() {
r := rand.New(rand.NewSource(time.Now().UnixNano()))
for j := 0; j < 10000; j++ {
k := int64(r.Uint32())
if k >= 0 {
counter.Inc()
}
}
}()
}
go func() {
prevValue := int64(0)
for counter.Load() != 10000000 { // Sometimes this condition is never met with AtomicCounter.
val := counter.Load()
if val%1000000 == 0 && val != prevValue {
prevValue = val
}
}
end <- true
fmt.Println("Count:", counter.Load())
}()
<-end
}
Upvotes: 37
Views: 20193
Reputation: 1001
Here are some benchmarks from my mac M1 - looks like writing atomics is 2x faster than mutex and around 25x faster when reading the value.
BenchmarkAtomicWrite-8 100000000 11.28 ns/op 0 B/op 0 allocs/op
BenchmarkMutexWrite-8 54016642 22.16 ns/op 0 B/op 0 allocs/op
BenchmarkAtomicRead-8 1000000000 0.8774 ns/op 0 B/op 0 allocs/op
BenchmarkMutexRead-8 54548967 22.08 ns/op 0 B/op 0 allocs/op
Feel free to run it on your device:
package main_test
import (
"sync"
"sync/atomic"
"testing"
)
type atom struct {
value atomic.Int64
}
type mute struct {
value int64
lock sync.Mutex
}
var (
a = atom{}
m = mute{}
)
func BenchmarkAtomicWrite(b *testing.B) {
for i := 0; i < b.N; i++ {
a.value.Add(1)
}
}
func BenchmarkMutexWrite(b *testing.B) {
for i := 0; i < b.N; i++ {
m.lock.Lock()
m.value++
m.lock.Unlock()
}
}
func BenchmarkAtomicRead(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = a.value.Load()
}
}
func BenchmarkMutexRead(b *testing.B) {
for i := 0; i < b.N; i++ {
m.lock.Lock()
_ = m.value
m.lock.Unlock()
}
}
Upvotes: 6
Reputation: 79704
There is no difference in behavior. There is a difference in performance.
Mutexes are slow, due to the setup and teardown, and due to the fact that they block other goroutines for the duration of the lock.
Atomic operations are fast because they use an atomic CPU instruction (when possible), rather than relying on external locks to.
Therefore, whenever it is feasible, atomic operations should be preferred.
Upvotes: 36
Reputation: 1361
Alright, I'm going to attempt to self-answer for some closure. Edits are welcome.
There is some discussion about the atomic package here. But to quote the most telling comments:
The very short summary is that if you have to ask, you should probably avoid the package. Or, read the atomic operations chapter of the C++11 standard; if you understand how to use those operations safely in C++, then you are more than capable of using Go's
sync/atomic
package.That said, sticking to
atomic.AddInt32
andatomic.LoadInt32
is safe as long as you are just reporting statistical information, and not actually relying on the values carrying any meaning about the state of the different goroutines.
And:
What atomicity does not guarantee, is any ordering of observability of values. I mean,
atomic.AddInt32()
does only guarantee that what this operation stores at &cnt will be exactly*cnt + 1
(with the value of*cnt
being what the CPU executing the active goroutine fetched from memory when the operation started); it does not provide any guarantee that if another goroutine will attempt to read this value at the same time it will fetch that same value*cnt + 1
.On the other hand, mutexes and channels guarantee strict ordering of accesses to values being shared/passed around (subject to the rules of Go memory model).
In regards to why the code sample in the question never finishes, this is due to fact that the func
that is reading the counter is in a very tight loop. When using the atomic counter, there are no syncronisation events (e.g. mutex
lock/unlock, syscalls) which means that the goroutine never yields control. The result of this is that this goroutine starves the thread it is running on, and prevents the scheduler from allocating time to any other goroutines allocated to that thread, this includes ones that increment the counter meaning the counter never reaches 10000000.
Upvotes: 20
Reputation: 55543
Atomics are faster in the common case: the compiler translates each call to a function from the sync/atomic
package to a special set of machine instructions which basically operate on the CPU level — for instance, on x86
architectures, an atomic.AddInt64
would be translated to some plain
ADD
-class instruction prefixed with the LOCK
instruction (see this for an example) —
with the latter ensuring coherent view of the updated memory location
across all the CPUs in the system.
A mutex is a much complicated thing as it, in the end, wraps some bit
of the native OS-specific thread synchronization API
(for instance, on Linux, that's futex
).
On the other hand, the Go runtime is pretty much optimized when it comes to synchronization stuff (which is kinda expected — given one of the main selling points of Go), and the mutex implementation tries to avoid hitting the kernel to perform synchronization between goroutines, if possible, and carry it out completely in the Go runtime itself.
This might explain no noticeable difference in the timings in your benchmarks, provided the contention over the mutexes was reasonably low.
Still, I feel oblidged to note — just in case — that atomics and higher-level synchronization facilities are designed to solve different tasks. Say, you can't use atomics to protect some memory state during the execution of a whole function — and even a single statement, in the general case.
Upvotes: 11