Golang: Producer/Consumer concurrency model but with serialized results

Question

func main() {
    jobs := []Job{job1, job2, job3}
    numOfJobs := len(jobs)
    resultsChan := make(chan *Result, numOfJobs)
    jobChan := make(chan *job, numOfJobs)
    go consume(numOfJobs, jobChan, resultsChan)
    for i := 0; i < numOfJobs; i++ {
        jobChan <- jobs[i]
    }
    close(jobChan)

    for i := 0; i < numOfJobs; i++ {
        <-resultsChan
    }
    close(resultsChan)
}

func (b *Blockchain) consume(num int, jobChan chan *Job, resultsChan chan *Result) {
    for i := 0; i < num; i++ {
        go func() {
            job := <-jobChan
            resultsChan <- doJob(job)
        }()
    }
}

In the above example, jobs are pushed into the jobChan and goroutines will pull it off the jobChan and execute the jobs concurrently and push results into resultsChan. We will then pull results out of resultsChan.

Question 1:

In my code, there is no serialized/linearilized results. Although jobs go in the order of job1, job2, job3. The results might come out as job3, job1, job2, depending which one takes the longest.

I would still like to execute the jobs concurrently, however, I need to make sure that results come out of the resultsChan in the same order that it went in as jobs.

Question2:

I have approximately 300k jobs, this means the code will generate up to 300k goroutines. Is this efficient to have so many goroutines or would I be better off group the jobs together in a slice of 100 or so and have each goroutine go through 100 rather than 1.

twotwotwo · Accepted Answer

Here's a way I've handled serialization (and also setting a limited number of workers). I set some worker objects with input and output fields and synchronization channels, then I go round-robin through them, picking up any work they've done and giving them a new job. Then I make one final pass through them to pick up any completed jobs that are left over. Note you might want the worker count to exceed your core count somewhat, so that you can keep all resources busy for a bit even when there's one unusually long job. Code is at http://play.golang.org/p/PM9y4ieMxw and below.

This is hairy (hairier than I remember it being before sitting down to write an example!)--would love to see what anyone else has, either just better implementations or a whole different way to accomplish your goal.

package main

import (
    "fmt"
    "math/rand"
    "runtime"
    "time"
)

type Worker struct {
    in     int
    out    int
    inited bool

    jobReady chan bool
    done     chan bool
}

func (w *Worker) work() {
    time.Sleep(time.Duration(rand.Float32() * float32(time.Second)))
    w.out = w.in + 1000
}
func (w *Worker) listen() {
    for <-w.jobReady {
        w.work()
        w.done <- true
    }
}
func doSerialJobs(in chan int, out chan int) {
    concurrency := 23
    workers := make([]Worker, concurrency)
    i := 0
    // feed in and get out items
    for workItem := range in {
        w := &workers[i%
            concurrency]
        if w.inited {
            <-w.done
            out <- w.out
        } else {
            w.jobReady = make(chan bool)
            w.done = make(chan bool)
            w.inited = true
            go w.listen()
        }
        w.in = workItem
        w.jobReady <- true
        i++
    }
    // get out any job results left over after we ran out of input
    for n := 0; n < concurrency; n++ {
        w := &workers[i%concurrency]
        if w.inited {
            <-w.done
            out <- w.out
        }
        close(w.jobReady)
        i++
    }
    close(out)
}
func main() {
    runtime.GOMAXPROCS(10)
    in, out := make(chan int), make(chan int)
    allFinished := make(chan bool)
    go doSerialJobs(in, out)
    go func() {
        for result := range out {
            fmt.Println(result)
        }
        allFinished <- true
    }()
    for i := 0; i < 100; i++ {
        in <- i
    }
    close(in)
    <-allFinished
}

Note that only in and out in this example carry actual data--all the other channels are just for synchronization.

Golang: Producer/Consumer concurrency model but with serialized results

Answers (1)

Related Questions