Efficiency measurments of Go's once Type

Question

I have a piece of code that I want to run only once for initialization. So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.

The implementation is the following https://golang.org/src/sync/once.go:

func (o *Once) Do(f func()) {
    if atomic.LoadUint32(&o.done) == 1 {
        return
    }
    // Slow-path.
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:

func test1() {
    o.Do(func() {
        // Do smth
    })
    wg.Done()
}

func test2() {
    m.Lock()
    if !b {
        func() {
            // Do smth
        }()
    }
    b = true
    m.Unlock()
    wg.Done()
}

func test3() {
    if !b {
        m.Lock()
        if !b {
            func() {
                // Do smth
            }()
            b = true
        }
        m.Unlock()
    }
    wg.Done()
}

I tested all versions by running the following code:

    wg.Add(10000)
    start = time.Now()
    for i := 0; i < 10000; i++ {
        go testX()
    }
    wg.Wait()
    end = time.Now()

    fmt.Printf("elapsed: %v
", end.Sub(start).Nanoseconds())

with the following resutls:

elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3

Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.

Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.

Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.

icza · Accepted Answer

That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.

Let's create the testable code:

func f() {
    // Code that must only be run once
}

var testOnce = &sync.Once{}

func DoWithOnce() {
    testOnce.Do(f)
}

var (
    mu = &sync.Mutex{}
    b  bool
)

func DoWithMutex() {
    mu.Lock()
    if !b {
        f()
        b = true
    }
    mu.Unlock()
}

Let's write proper testing / benchmarking code using the testing package:

func BenchmarkOnce(b *testing.B) {
    for i := 0; i < b.N; i++ {
        DoWithOnce()
    }
}

func BenchmarkMutex(b *testing.B) {
    for i := 0; i < b.N; i++ {
        DoWithMutex()
    }
}

We can run the benchmark with the following code:

go test -bench .

And here are the benchmarking results:

BenchmarkOnce-4         200000000                6.30 ns/op
BenchmarkMutex-4        100000000               20.0 ns/op
PASS

As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.

Parallel testing (from multiple goroutines)

Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.

Nevertheless, let's benchmark it.

Here are the benchmarking code using parallel testing:

func BenchmarkOnceParallel(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            DoWithOnce()
        }
    })
}

func BenchmarkMutexParallel(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            DoWithMutex()
        }
    })
}

I have 4 cores on my machine, so I'm gonna use those 4 cores:

go test -bench Parallel -cpu=4

^{(You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)}

And here are the results:

BenchmarkOnceParallel-4         500000000                3.04 ns/op
BenchmarkMutexParallel-4        20000000                93.7 ns/op

When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).

We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).

Efficiency measurments of Go's once Type

Answers (1)

Parallel testing (from multiple goroutines)

Related Questions

Efficiency measurments of Go&#39;s once Type

Answers (1)

Parallel testing (from multiple goroutines)

Related Questions

Efficiency measurments of Go's once Type