androidGuy
androidGuy

Reputation: 5643

Error: ssh: handshake failed: read tcp read: connection reset by peer

I am trying to connect to my vendor's SFTP server using the following golang code.

    config := &ssh.ClientConfig{
        User:            user,
        Auth:            []ssh.AuthMethod{ssh.PublicKeys(signer)},
        HostKeyCallback: ssh.InsecureIgnoreHostKey(),
    }
    config.SetDefaults()

    // connect
    conn, err = ssh.Dial(network, address+port, config)
    if err != nil {
        return err
    }
    defer conn.Close()

I am randomly getting ssh: handshake failed: read tcp <ip_address_here>:<port_here>-><ftp_ip_address_here>:<ftp_port_here>: read: connection reset by peer error in ssh.Dial() method. It works 2 out of 5 times but fails other 3 times. Could this be due to high amount of connection request that FTP server is getting from other clients? And is exponential backoff retry recommended in these cases?

Upvotes: 2

Views: 7928

Answers (1)

VonC
VonC

Reputation: 1323115

Could this be due to high amount of connection request that FTP server is getting from other clients?

That is suggested indeed by golang/go issue 20960 "net/http: read: connection reset by peer under high load": (got HTTPS, but the limits could apply for SSH connections as well)

If you get broken network connections, they might be real (a peer really did disconnect), or they might be from hitting kernel limits on one of the sides.

Example:

I am running 1000 concurrent connections at it eventually hits this also on OSX with file descriptors bumped up from default (256) to 4096.

The server configuration would matter.

Or:

I also see this issue under macOS where server and client talk over localhost. I don't know the exact cause of the problem but running netstat during high load displays very large number of connections in TIME_WAIT state.

Either I exhaust file descriptors or local ports.
Or both.

I keep seeing connection reset by peer until the number of TIME_WAIT connections is about 7k. If I let the test run even further, I eventually get connect: can't assign requested address error which suggests that I exhausted local ports.
The number of TIME_WAIT connections is at 15k.

For my project that's not needed and I could solve the problem by limiting number of concurrent connections. Solving the problem in my code by limiting number of goroutines seemed arbitrary and still occasionally failed.
Limiting MaxConnsPerHost in http transport did the trick and problem went away completely.

Upvotes: 2

Related Questions