roger
roger

Reputation: 9943

Reasonable use of goroutines in Go programs

My program has a long running task. I have a list jdIdList that is too big - up to 1000000 items, so the code below doesn't work. Is there a way to improve the code with better use of goroutines?

It seems I have too many goroutines running which makes my code fail to run.

What is a reasonable number of goroutines to have running?

var wg sync.WaitGroup
wg.Add(len(jdIdList))
c := make(chan string)

// just think jdIdList as [0...1000000]
for _, jdId := range jdIdList {
    go func(jdId string) {
        defer wg.Done()
        for _, itemId := range itemIdList {
            // following code is doing some computation which consumes much time(you can just replace them with time.Sleep(time.Second * 1)
            cvVec, ok := cvVecMap[itemId]
            if !ok {
                continue
            }
            jdVec, ok := jdVecMap[jdId]
            if !ok {
                continue
            }
            // long time compute
            _ = 0.3*computeDist(jdVec.JdPosVec, cvVec.CvPosVec) + 0.7*computeDist(jdVec.JdDescVec, cvVec.CvDescVec)
        }
        c <- fmt.Sprintf("done %s", jdId)
    }(jdId)

}

go func() {
    for resp := range c {
        fmt.Println(resp)
    }
}()

Upvotes: 0

Views: 427

Answers (1)

Drathier
Drathier

Reputation: 14539

It looks like you're running too many things concurrently, making your computer run out of memory.

Here's a version of your code that uses a limited number of worker goroutines instead of a million goroutines as in your example. Since only a few goroutines run at once, they have much more memory available each before the system starts to swap. Make sure the memory each small computation requires times the number of concurrent goroutines is less than the memory you have in your system, so if the code inside for jdId := range work loop requires less than 1GB memory, and you have 4 cores and at least 4 GB of RAM, setting clvl to 4 should work fine.

I also removed the waitgroups. The code is still correct, but only uses channels for synchronization. A for range loop over a channel reads from that channel until it is closed. This is how we tell the worker threads when we are done.

https://play.golang.org/p/Sy3i77TJjA

runtime.GOMAXPROCS(runtime.NumCPU()) // not needed on go 1.5 or later
c := make(chan string)
work := make(chan int, 1) // increasing 1 to a higher number will probably increase performance
clvl := 4 // runtime.NumCPU() // simulating having 4 cores, use NumCPU otherwise
var wg sync.WaitGroup
wg.Add(clvl)
for i := 0; i < clvl; i++ {
    go func(i int) {
        for jdId := range work {
            time.Sleep(time.Millisecond * 100)
            c <- fmt.Sprintf("done %d", jdId)
        }
        wg.Done()
    }(i)
}

// give workers something to do
go func() { 
    for i := 0; i < 10; i++ {
        work <- i
    }

    close(work)
}()

// close output channel when all workers are done
go func() { 
    wg.Wait()
    close(c)
}()

count := 0
for resp := range c {
    fmt.Println(resp, count)
    count += 1
}

which generated this output on go playground, while simulating four cpu cores.

done 1 0
done 0 1
done 3 2
done 2 3
done 5 4
done 4 5
done 7 6
done 6 7
done 9 8
done 8 9

Notice how the ordering is not guaranteed. The jdId variable holds the value you want. You should always test your concurrent programs using the go race detector.

Also note that if you are using go 1.4 or earlier and haven't set the GOMAXPROCS environment variable to the number of cores, you should do that, or add runtime.GOMAXPROCS(runtime.NumCPU()) to the beginning of your program.

Upvotes: 3

Related Questions