mwalto7
mwalto7

Reputation: 307

Golang Syncing Goroutines

I am working on an SSH client application for configuring network devices concurrently, and I am running into issues implementing the concurrency. My program takes in a slice of hosts and a slice of config commands to send to each host. I am using a sync.WaitGroup to wait for all the hosts to finish being configured. This works fine for small batches of hosts, but soon the functions within my configuration goroutines start randomly failing. If I rerun the program on the failed hosts, some will succeed and again some will fail. I have to repeat this process until only the hosts with actual errors remain. It always fails either on the authentication saying authentication failed: auth methods tried [none password]... or the values from sysDescr don't get added to some Devices fields. It's as if when there are many hosts and goroutines running, they start returning early or something. I'm really not sure what is going on.

Here is a sample of my code:

package main

import (
    "fmt"
    "net"
    "os"
    "sync"
    "time"

    "golang.org/x/crypto/ssh"
)

func main() {
    // When there are many hosts, many calls to Dial and
    // sysDescr fail. If I rerun the program on the unsuccessful
    // hosts, nothing fails and the expected output is produced.
    var hosts []string
    cfg := &ssh.ClientConfig{
        User:            "user",
        Auth:            []ssh.AuthMethod{ssh.Password("pass")},
        HostKeyCallback: ssh.InsecureIgnoreHostKey(),
        Timeout:         10 * time.Second,
    }
    results := make(chan *result, len(hosts))

    var wg sync.WaitGroup
    wg.Add(len(hosts))
    for _, host := range hosts {
        go connect(host, cfg, results, &wg)
    }
    wg.Wait()
    close(results)

    for res := range results {
        if res.err != nil {
            fmt.Fprintln(os.Stderr, res.Err)
            continue
        }
        d := res.device
        fmt.Println(d.addr, d.hostname, d.vendor, d.os, d.model, d.version)
        d.Client.Close()
    }
}

// Device represents a network device.
type Device struct {
    *ssh.Client
    addr     string
    hostname string
    vendor   string
    os       string
    model    string
    version  string
}

// Dial establishes an ssh client connection to a remote host.
func Dial(host, port string, cfg *ssh.ClientConfig) (*Device, error) {
    // get host info in background, may take a second
    info := make(chan map[string]string)
    go func(c *Client) {
        info <- sysDescr(host)
        close(info)
    }(c)

    // establish ssh client connection to host
    client, err := ssh.Dial("tcp", net.JoinHostPort(host, addr), cfg)
    if err != nil {
        return nil, err
    }

    m := <-info
    d := &Device{
        Client: client,
        addr: m["addr"],
        hostname: m["hostname"],
        vendor: m["vendor"],
        os: m["os"],
        model: m["model"],
        version: m["version"],
    }
    return d, nil
}

// sysDescr attempts to gather information about a remote host.
func sysDescr(host string) map[string]string {
    // get host info
}

// result is the result of connecting to a device.
type result struct {
    device *Device
    err    error
}

// connect establishes an ssh client connection to a host.
func connect(host string, cfg *ssh.ClientConfig, results chan<- *result, wg *sync.WaitGroup) {
    defer wg.Done()
    device, err := Dial(host, "22", cfg)
    results <- &result{device, err}
}

Am I doing something wrong? Can I limit the number of goroutines being spawned instead of spawning a goroutine for each host?

Upvotes: 0

Views: 807

Answers (1)

dm03514
dm03514

Reputation: 55962

Yes! To answer your second question, there are many patterns in go to limit concurrency. Two of the largest are:

I personally prefer the worker pool variant because IMO it better helps to keep the business logic separate from the scheduling, but both are completely valid and are present out in the wild in many many projects.

Upvotes: 1

Related Questions