The idiomatic way to implement generators (yield) in Golang for recursive functions

[ Note: I read Python-style generators in Go, this is not a duplicate of it. ]

In Python / Ruby / JavaScript / ECMAScript 6, generator functions could be written using the yield keyword provided by the language. In Go, it could be simulated using a goroutine and a channel.

The Code

The following code shows how a permutation function (abcd, abdc, acbd, acdb, ..., dcba) could be implemented:

// $src/lib/lib.go

package lib

// private, starts with lowercase "p"
func permutateWithChannel(channel chan<- []string, strings, prefix []string) {
    length := len(strings)
    if length == 0 {
        // Base case
        channel <- prefix
        return
    }
    // Recursive case
    newStrings := make([]string, 0, length-1)
    for i, s := range strings {
        // Remove strings[i] and assign the result to newStringI
        // Append strings[i] to newPrefixI
        // Call the recursive case
        newStringsI := append(newStrings, strings[:i]...)
        newStringsI = append(newStringsI, strings[i+1:]...)
        newPrefixI := append(prefix, s)
        permutateWithChannel(channel, newStringsI, newPrefixI)
    }
}

// public, starts with uppercase "P"
func PermutateWithChannel(strings []string) chan []string {
    channel := make(chan []string)
    prefix := make([]string, 0, len(strings))
    go func() {
        permutateWithChannel(channel, strings, prefix)
        close(channel)
    }()
    return channel
}

Here is how it could be used:

// $src/main.go

package main

import (
    "./lib"
    "fmt"
)

var (
    fruits  = []string{"apple", "banana", "cherry", "durian"}
    banned = "durian"
)

func main() {
    channel := lib.PermutateWithChannel(fruits)
    for myFruits := range channel {
        fmt.Println(myFruits)
        if myFruits[0] == banned {
            close(channel)
            //break
        }
    }
}

Note:

The break statement (commented above) is not needed, as close(channel) causes range to return false in the next iteration, the loop will terminate.

The Problem

If the caller doesn't need all the permutations, it needs to close() the channel explicitly, or the channel will not be closed until the program terminates (resource leakage occurs). On the other hand, if the caller needs all the permutations (i.e. the range loops until the end), the caller MUST NOT close() the channel. It is because close()-ing an already closed channel causes a runtime panic (see here in the spec). However, if the logic to determine whether it should stop or not is not as simple as shown above, I think it is better to use defer close(channel).

The Questions

  1. What is the idiomatic way to implement generators like this?
  2. Idiomatically, who should be responsible to close() the channel - the library function or the caller?
  3. Is it a good idea to modify my code like below, so that the caller is responsible to defer close() the channel no matter what?

In the library, modify this:

    go func() {
        permutateWithChannel(channel, strings, prefix)
        close(channel)
    }()

to this:

    go permutateWithChannel(channel, strings, prefix)

In the caller, modify this:

func main() {
    channel := lib.PermutateWithChannel(fruits)
    for myFruits := range channel {
        fmt.Println(myFruits)
        if myFruits[0] == banned {
            close(channel)
        }
    }
}

to this:

func main() {
    channel := lib.PermutateWithChannel(fruits)
    defer close(channel)    // <- Added
    for myFruits := range channel {
        fmt.Println(myFruits)
        if myFruits[0] == banned {
            break           // <- Changed
        }
    }
}
  1. Despite it is not observable by executing the code above, and the correctness of the algorithm is not affected, after the caller close()s the channel, the goroutine running the library code should panic when it tries to send to the closed channel in the next iteration, as documented here in the spec, causing it to terminate. Does this cause any negative side-effect?
  2. The signature of the library function is func(strings []string) chan []string. Ideally, the return type should be <-chan []string to restrict it to be receive-only. However, if it is the caller who is responsible to close() the channel, it could not be marked as "receive-only", as the close() built-in function does not work on receive-only channels. What is the idiomatic way to deal with this?

Upvotes: 55

Views: 54653

Answers (3)

gonzojive
gonzojive

Reputation: 2216

I agree with icza's answer. To summarize, there are are two alternatives:

  1. mapping function: Use a callback to iterate over a collection. func myIterationFn(yieldfunc (myType)) (stopIterating bool). This has the disadvantage of ceding the control flow to myGenerator function. myIterationFn is not a Pythonic generator because it doesn't return an iterable sequence.
  2. channels: Use a channel and be wary of leaking goroutines. It's possible to transform myIterationFn into a function that returns an iterable sequence. The following code provides an example of such a transformation.
myMapper := func(yield func(int) bool) {
    for i := 0; i < 5; i++ {
        if done := yield(i); done {
            return
        }
    }
}
iter, cancel := mapperToIterator(myMapper)
defer cancel() // This line is very important - it prevents goroutine leaks.
for value, ok := iter(); ok; value, ok = iter() {
    fmt.Printf("value: %d\n", value)
}

Here's a complete program as an example. mapperToIterator does the transformation from a mapping function to a generator. Go's lack of generics requires casting from interface{} to int.

package main

import "fmt"

// yieldFn reports true if an iteration should continue. It is called on values
// of a collection.
type yieldFn func(interface{}) (stopIterating bool)

// mapperFn calls yieldFn for each member of a collection.
type mapperFn func(yieldFn)

// iteratorFn returns the next item in an iteration or the zero value. The
// second return value is true when iteration is complete.
type iteratorFn func() (value interface{}, done bool)

// cancelFn should be called to clean up the goroutine that would otherwise leak.
type cancelFn func()

// mapperToIterator returns an iteratorFn version of a mappingFn. The second
// return value must be called at the end of iteration, or the underlying
// goroutine will leak.
func mapperToIterator(m mapperFn) (iteratorFn, cancelFn) {
    generatedValues := make(chan interface{}, 1)
    stopCh := make(chan interface{}, 1)
    go func() {
        m(func(obj interface{}) bool {
            select {
            case <-stopCh:
                return false
            case generatedValues <- obj:
                return true
            }
        })
        close(generatedValues)
    }()
    iter := func() (value interface{}, notDone bool) {
        value, notDone = <-generatedValues
        return
    }
    return iter, func() {
        stopCh <- nil
    }
}

func main() {
    myMapper := func(yield yieldFn) {
        for i := 0; i < 5; i++ {
            if keepGoing := yield(i); !keepGoing {
                return
            }
        }
    }
    iter, cancel := mapperToIterator(myMapper)
    defer cancel()
    for value, notDone := iter(); notDone; value, notDone = iter() {
        fmt.Printf("value: %d\n", value.(int))
    }
}

Upvotes: 3

icza
icza

Reputation: 417777

I. Alternatives

Foreword: I will use a much simpler generator, because the problem does not concern the generator complexity but rather the signals between the generator and consumer, and the call of the consumer itself. This simple generator just generates the integer numbers from 0 to 9.

1. With a function value

A generate-consumer pattern is much cleaner with a simple consumer function passed, which also has the advantage that it can return a value signalling if abortion or any other action is required.

And since in the example only one event is to be signaled ("abort"), the consumer function will have bool return type, signalling if abort is required.

So see this simple example with a consumer function value passed to the generator:

func generate(process func(x int) bool) {
    for i := 0; i < 10; i++ {
        if process(i) {
            break
        }
    }
}

func main() {
    process := func(x int) bool {
        fmt.Println("Processing", x)
        return x == 3 // Terminate if x == 3
    }
    generate(process)
}

Output (try it on the Go Playground):

Processing 0
Processing 1
Processing 2
Processing 3

Note that the consumer (process) does not need to be a "local" function, it can be declared outside of main(), e.g. it can be a global function or a function from another package.

The potential downside of this solution is that it uses only 1 goroutine both for generating and consuming values.

2. With channels

If you still want to do it with channels, you can. Note that since the channel is created by the generator, and since the consumer loops over the values received from the channel (ideally with a for ... range construct), it is the generator's responsibility to close the channel. Settling with this also allows you to return a receive-only channel.

And yes, closing the returned channel in the generator is best done as a deferred statement, so even if the generator panics, the consumer will not get blocked. But note that this deferred close is not in the generate() function but in the anonymous function started from generate() and executed as a new goroutine; else the channel would be closed before it is returned from generate() - not useful at all...

And if you want to signal the generator from the consumer (e.g. to abort and not generate further values), you can use e.g. another channel, which is passed to the generator. Since the generator will only "listen" to this channel, it can also be declared as a receive-only channel to the generator. If you only need to signal one event (abort in our case), no need to send any values on this channel, a simple close will do it. If you need to signal multiple events, it can be done by actually sending a value on this channel, the event / action to be carried out (where abort may be one from multiple events).

And you can use the select statement as the idiomatic way to handle sending values on the returned channel and watching the channel passed to the generator.

Here is a solution with an abort channel:

func generate(abort <-chan struct{}) <-chan int {
    ch := make(chan int)
    go func() {
        defer close(ch)
        for i := 0; i < 10; i++ {
            select {
            case ch <- i:
                fmt.Println("Sent", i)
            case <-abort: // receive on closed channel can proceed immediately
                fmt.Println("Aborting")
                return
            }
        }
    }()
    return ch
}

func main() {
    abort := make(chan struct{})
    ch := generate(abort)
    for v := range ch {
        fmt.Println("Processing", v)
        if v == 3 { // Terminate if v == 3
            close(abort)
            break
        }
    }
    // Sleep to prevent termination so we see if other goroutine panics
    time.Sleep(time.Second)
}

Output (try it on the Go Playground):

Sent 0
Processing 0
Processing 1
Sent 1
Sent 2
Processing 2
Processing 3
Sent 3
Aborting

The obvious advantage of this solution is that it already uses 2 goroutines (1 that generates values, 1 that consumes/processes them), and it is very easy to extend it to process the generated values with any number of goroutines as the channel returned by the generator can be used from multiple goroutines concurrently - channels are safe to be receiving from concurrently, data races cannot occur, by design; for more read: If I am using channels properly should I need to use mutexes?

II. Answers to unaddressed questions

An "uncaught" panic on a goroutine will end the execution of the goroutine but will not cause a problem in regards to resource leak. But if the function executed as a separate goroutine would free resources (in non-deferred statements) allocated by it in case of non-panic, that code will obviously not run and will cause resource leak for example.

You haven't observed this because the program terminates when the main goroutine terminates (and it does not wait for other non-main goroutines to finish - so your other goroutines did not get a chance to panic). See Spec: Program execution.

But know that panic() and recover() are for exceptional cases, they are not intended for such general use cases like the Exceptions and try-catch blocks in Java. Panics should be avoided, by returning errors (and handling them!) for example, and panics should definitely not leave the "borders" of packages (e.g. panic() and recover() may be justified to be used in a package implementation, but panicking state should be "caught" inside the package and not let out of it).

Upvotes: 46

Uvelichitel
Uvelichitel

Reputation: 8490

To my mind usually generators are just wrappers around closure internally. Something like this

package main

import "fmt"

// This function `generator` returns another function, which
// we define anonymously in the body of `generator`. The
// returned function _closes over_ the variable `data` to
// form a closure.
func generator(data int, permutation func(int) int, bound int) func() (int, bool) {
    return func() (int, bool) {
        data = permutation(data)
        return data, data < bound
    }
}

// permutation function
func increment(j int) int {
    j += 1
    return j
}

func main() {
    // We call `generator`, assigning the result (a function)
    // to `next`. This function value captures its
    // own `data` value, which will be updated each time
    // we call `next`.
    next := generator(1, increment, 7)
    // See the effect of the closure by calling `next`
    // a few times.
    fmt.Println(next())
    fmt.Println(next())
    fmt.Println(next())
    // To confirm that the state is unique to that
    // particular function, create and test a new one.
    for next, generation, ok := generator(11, increment, 17), 0, true; ok; {
        generation, ok = next()
        fmt.Println(generation)
    }
}

It looks not as elegant as 'for range' but quite clear semantically and syntactically for me. And it works http://play.golang.org/p/fz8xs0RYz9

Upvotes: 10

Related Questions