Stan Sieler
Stan Sieler

Reputation: 759

Why does scp sporadically fail, when doing multiple scps in parallel?

I have a small application that's trying to do a dozen parallel "scp" runs, pulling files from a remote system. Usually, it runs fine. Sometimes, one or two of the scp runs quietly dies. ("quiet" if pulling from Linux. If pulling from HP-UX, I get a message like Connection reset by peer.)

If I add "-v" to my scp commands, then when a failure occurs, I see that I'm getting "ssh_exchange_identification: read: Connection reset by peer" (on Linux ... haven't tried the -v on HP-UX).

Here's the "scp -v" output for a typical run, with the point where a 'bad' run and a 'good' run diverge indicated:

Executing: program /usr/bin/ssh host wilbur, user (unspecified), command scp -v -p -f /home/sieler/source/misc/[p-q]*.[ch]
OpenSSH_6.9p1, LibreSSL 2.1.8
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 51: Applying options for *
debug1: Connecting to wilbur [10.84.3.61] port 22.
debug1: Connection established.
debug1: identity file /Users/sieler/.ssh/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/sieler/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.9

'bad' and 'good' runs match up to this point, then...

Bad:

ssh_exchange_identification: read: Connection reset by peer

Good:

debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH_5* compat 0x0c000000
debug1: Authenticating to wilbur:22 as 'sieler'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr [email protected] none
debug1: kex: client->server aes128-ctr [email protected] none
...

Although the usual host machine for the script and scp runs is a Mac, running OS X 10.11.4, the problem was been reproduced to/from several combinations of Mac/Linux/HP-UX (enough to rule out it being a Mac or HP-UX specific problem).

IIRC, using scp to pull from Linux to Mac has had the problem, as well as pulling from HP-UX to Mac, and pulling from Linux to HP-UX.
Haven't tried pulling from Mac or HP-UX to Linux.

Is there something about scp/ssh/openssh that parallel usage sometimes fails?

If I run sshd on the Linux system with -ddd, then the demon stops after the first scp accesses it (the scp has no problem), and the other eleven scp runs fail.

Thanks

Upvotes: 3

Views: 3192

Answers (1)

Jakuje
Jakuje

Reputation: 25966

This is probably caused by the limitation of parallel sessions in sshd_config. By default, server is configured to do "random early drop", which means refusing new connections, if amount of active is bigger than some limit. The responsible option is MaxStartups (from man sshd_config):

MaxStartups

Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection. The default is 10:30:100.

Alternatively, random early drop can be enabled by specifying the three colon separated values “start:rate:full” (e.g. "10:30:60"). sshd(8) will refuse connection attempts with a probability of “rate/100” (30%) if there are currently “start” (10) unauthenticated connections. The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches “full” (60).

Bumping the value to something bigger than the amount of connections you expect should solve your problem. Otherwise, you can set LogLevel DEBUG3 in sshd_config to see more logs in system log.


But when you are connecting to the same server, it is better to use connection multiplexing. It will be faster and you will not have these problems. Check out ControlMaster option in ssh_config or just check my similar answer for fast excursion to this "magic".

Upvotes: 5

Related Questions