Reputation: 307
I am working on an SSH client application for configuring network devices concurrently, and I am running into issues implementing the concurrency. My program takes in a slice of hosts and a slice of config commands to send to each host. I am using a sync.WaitGroup
to wait for all the hosts to finish being configured. This works fine for small batches of hosts, but soon the functions within my configuration goroutines start randomly failing. If I rerun the program on the failed hosts, some will succeed and again some will fail. I have to repeat this process until only the hosts with actual errors remain. It always fails either on the authentication saying authentication failed: auth methods tried [none password]...
or the values from sysDescr
don't get added to some Device
s fields. It's as if when there are many hosts and goroutines running, they start returning early or something. I'm really not sure what is going on.
Here is a sample of my code:
package main
import (
"fmt"
"net"
"os"
"sync"
"time"
"golang.org/x/crypto/ssh"
)
func main() {
// When there are many hosts, many calls to Dial and
// sysDescr fail. If I rerun the program on the unsuccessful
// hosts, nothing fails and the expected output is produced.
var hosts []string
cfg := &ssh.ClientConfig{
User: "user",
Auth: []ssh.AuthMethod{ssh.Password("pass")},
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
Timeout: 10 * time.Second,
}
results := make(chan *result, len(hosts))
var wg sync.WaitGroup
wg.Add(len(hosts))
for _, host := range hosts {
go connect(host, cfg, results, &wg)
}
wg.Wait()
close(results)
for res := range results {
if res.err != nil {
fmt.Fprintln(os.Stderr, res.Err)
continue
}
d := res.device
fmt.Println(d.addr, d.hostname, d.vendor, d.os, d.model, d.version)
d.Client.Close()
}
}
// Device represents a network device.
type Device struct {
*ssh.Client
addr string
hostname string
vendor string
os string
model string
version string
}
// Dial establishes an ssh client connection to a remote host.
func Dial(host, port string, cfg *ssh.ClientConfig) (*Device, error) {
// get host info in background, may take a second
info := make(chan map[string]string)
go func(c *Client) {
info <- sysDescr(host)
close(info)
}(c)
// establish ssh client connection to host
client, err := ssh.Dial("tcp", net.JoinHostPort(host, addr), cfg)
if err != nil {
return nil, err
}
m := <-info
d := &Device{
Client: client,
addr: m["addr"],
hostname: m["hostname"],
vendor: m["vendor"],
os: m["os"],
model: m["model"],
version: m["version"],
}
return d, nil
}
// sysDescr attempts to gather information about a remote host.
func sysDescr(host string) map[string]string {
// get host info
}
// result is the result of connecting to a device.
type result struct {
device *Device
err error
}
// connect establishes an ssh client connection to a host.
func connect(host string, cfg *ssh.ClientConfig, results chan<- *result, wg *sync.WaitGroup) {
defer wg.Done()
device, err := Dial(host, "22", cfg)
results <- &result{device, err}
}
Am I doing something wrong? Can I limit the number of goroutines being spawned instead of spawning a goroutine for each host?
Upvotes: 0
Views: 807
Reputation: 55962
Yes! To answer your second question, there are many patterns in go to limit concurrency. Two of the largest are:
I personally prefer the worker pool variant because IMO it better helps to keep the business logic separate from the scheduling, but both are completely valid and are present out in the wild in many many projects.
Upvotes: 1