Reputation: 75
I'm new to Go and to concurrent/parallel programming in general. In order to try out (and hopefully see the performance benefits of) goroutines, I've put together a small test program that simply generates 100 million random int
s - first in a single goroutine, and then in as many goroutines as reported by runtime.NumCPU()
.
However, I consistently get worse performance using more goroutines than using a single one. I assume I'm missing something vital in either my programs design or the way in which I use goroutines/channels/other Go features. Any feedback is much appreciated.
I attach the code below.
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan int, numIntsToGenerate)
// Slices to keep resulting ints
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}
Upvotes: 4
Views: 5919
Reputation: 417957
First let's correct and optimize some things in your code:
Since Go 1.5, GOMAXPROCS
defaults to the number of CPU cores available, so no need to set that (although it does no harm).
Numbers to generate:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads
If numThreads
is like 3, in case of multi goroutines, you'll have less numbers generated (due to integer division), so let's correct it:
numIntsToGenerate = numIntsPerThread * numThreads
No need a buffer for 100 million values, reduce that to a sensible value (e.g. 1000):
ch := make(chan int, 1000)
If you want to use append()
, the slices you create should have 0 length (and proper capacity):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
But in your case that's unnecessary, as only 1 goroutine is collecting the results, you can simply use indexing, and create slices like this:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)
And when collecting results:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}
Ok. Code is now better. Attempting to run it, you will still experience that the multi-goroutine version runs slower. Why is that?
It's because controlling, synchronizing and collecting results from multiple goroutines does have overhead. If the task they perform is little, the communication overhead will be greater and overall you lose performance.
Your case is such a case. Generating a single random number once you set up your rand.Rand()
is pretty fast.
Let's modify your "task" to be big enough so that we can see the benefit of multiple goroutines:
// 1 million is enough now:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// Kill time, do some processing:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// and now return a single random number
ch <- generator.Intn(numInts * 100)
}
}
In this case to get a random number, we generate 1000 random numbers and just throw them away (to make some calculation / kill time) before we generate the one we return. We do this so that the calculation time of the worker goroutines outweights the communication overhead of multiple goroutines.
Running the app now, my results on a 4-core machine:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms
The multi-goroutine version runs 2.5 times faster. This means if your goroutines would deliver random numbers in 1000-blocks, you would see 2.5 times faster execution (compared to the single goroutine generation).
One last note:
Your single-goroutine version also uses multiple goroutines: 1 to generate numbers and 1 to collect the results. Most likely the collector does not fully utilize a CPU core and mostly just waits for the results, but still: 2 CPU cores are used. Let's estimate that "1.5" CPU cores are utilized. While the multi-goroutine version utilizes 4 CPU cores. Just as a rough estimation: 4 / 1.5 = 2.66, very close to our performance gain.
Upvotes: 6
Reputation: 33657
If you really want to generate the random numbers in parallel then each task should be about generate the numbers and then return them in one go rather than the task being generate one number at a time and feed them to a channel as that reading and writing to channel will slow things down in multi go routine case. Below is the modified code where then task generate the required numbers in one go and this performs better in multi go routines case, also I have used slice of slices to collect the result from multi go routines.
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan []int)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
//To avoid not used warning
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}
Upvotes: 1