AkiRoss
AkiRoss

Reputation: 12273

How to detect what is preventing multiple cores being used in golang?

So, I have a piece of code that is concurrent and it's meant to be run onto each CPU/core.

There are two large vectors with input/output values

var (
    input = make([]float64, rowCount)
    output = make([]float64, rowCount)
)

these are filled and I want to compute the distance (error) between each input-output pair. Being the pairs independent, a possible concurrent version is the following:

var d float64 // Error to be computed
// Setup a worker "for each CPU"
ch := make(chan float64)
nw := runtime.NumCPU()
for w := 0; w < nw; w++ {
    go func(id int) {
         var wd float64
         // eg nw = 4
         // worker0, i = 0, 4, 8, 12...
         // worker1, i = 1, 5, 9, 13...
         // worker2, i = 2, 6, 10, 14...
         // worker3, i = 3, 7, 11, 15...
         for i := id; i < rowCount; i += nw {
             res := compute(input[i])
             wd += distance(res, output[i])
         }
         ch <- wd
    }(w)
}
// Compute total distance
for w := 0; w < nw; w++ {
    d += <-ch
}

The idea is to have a single worker for each CPU/core, and each worker processes a subset of the rows.

The problem I'm having is that this code is no faster than the serial code.

Now, I'm using Go 1.7 so runtime.GOMAXPROCS should be already set to runtime.NumCPU(), but even setting it explicitly does not improves performances.

So, besides accessing the global data (input/output) for reading, there are no locks/mutexes that I am aware of (not using math/rand, for example).

I also compiled with -race and nothing emerged.

My host has 4 virtual cores, but when I run this code I get (using htop) CPU usage to 102%, but I expected something around 380%, as it happened in the past with other go code that used all the cores.

I would like to investigate, but I don't know how the runtime allocates threads and schedule goroutines.

How can I debug this kind of issues? Can pprof help me in this case? What about the runtime package?

Thanks in advance

Upvotes: 1

Views: 482

Answers (1)

AkiRoss
AkiRoss

Reputation: 12273

Sorry, but in the end I got the measurement wrong. @JimB was right, and I had a minor leak, but not so much to justify a slowdown of this magnitude.

My expectations were too high: the function I was making concurrent was called only at the beginning of the program, therefore the performance improvement was just minor.

After applying the pattern to other sections of the program, I got the expected results. My mistake in evaluation which section was the most important.

Anyway, I learned a lot of interesting things meanwhile, so thanks a lot to all the people trying to help!

Upvotes: 1

Related Questions