Reputation: 1
Ok, so it's possible that the answer to this question is simply "stop using parallel-ssh and write your own code using netmiko/paramiko. Also, upgrade to python 3 already."
But here's my issue: I'm using parallel-ssh to try to hit as many as 80 devices at a time. These devices are notoriously unreliable, and they occasionally freeze up after giving one or two lines of output. Then, the parallel-ssh code hangs for hours, leaving the script running, well, until I kill it. I've jumped onto the VM running the scripts after a weekend and seen a job that's been stuck for 52 hours.
The relevant pieces of my first code, the one that hangs:
from pssh.pssh2_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
result = client.run_command(cmd, stop_on_errors=False)
return result
The next thing I tried was the channel_timout option, because if it takes more than 4 minutes to get the command output, then I know that the device froze, and I need to move on and cycle it later in the script:
from pssh.pssh_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, channel_timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
result = client.run_command(cmd, stop_on_errors=False)
return result
This version never actually connects to anything. Any advice? I haven't been able to find anything other than channel_timeout to attempt to kill an ssh session after a certain amount of time.
Upvotes: 0
Views: 1450
Reputation: 5270
The code is creating a client object inside a function and then returning only the output of run_command
which includes remote channels to the SSH server.
Since the client
object is never returned by the function it goes out of scope and gets garbage collected by Python which closes the connection.
Trying to use remote channels on a closed connection will never work. If you capture stack trace of the stuck script it is most probably hanging at using remote channel or connection.
Change your code to keep the client alive. Client should ideally also be reused.
from pssh.pssh2_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
result = client.run_command(cmd, stop_on_errors=False)
return client, result
Make sure you understand where the code is going wrong before jumping to conclusions that will not solve the issue, ie capture stack trace of where it is hanging. Same code doing the same thing will break the same way..
Upvotes: 0