Reputation: 28659
I am using the googleapiclient
python api to start a vm, and then paramiko
to connect to it via ssh.
I use googleapiclient.discovery
to get the GCE
api
compute = googleapiclient.discovery.build('compute', 'v1')
I start my vm using the start
api call
req = compute.instances().start(project, zone, instance)
resp = request.execute()
while resp['status'] != 'DONE':
time.sleep(1)
resp = req.execute()
I then perform a get
request to find the vm details, and in turn the ephemeral external ip address
req = compute.instances().get(project, zone, instance)
info = req.execute()
ip_address = info['networkInterfaces'][0]['accessConfigs'][0]['natIP']
Finally, I use paramiko
to connect to this ip address.
ssh_client = paramiko.SSHClient()
ssh_client.connect(ip_address)
Non-deterministically, the connect
call fails:
.../lib/python3.6/site-packages/paramiko/client.py", line 362, in connect raise NoValidConnectionsError(errors) paramiko.ssh_exception.NoValidConnections Error: [Errno None] Unable to connect to port 22 on xxx.xxx.xxx.xxx
It seems to be timing related, as putting in a time.sleep(5)
before the ssh_client.connect
call has preventing this error.
I'm assuming this allows sufficient time for sshd
to start accepting connections, but I'm not certain.
Putting sleeps in my code is uber hacky, so I'd much prefer to find a way to deterministically wait until the ssh daemon is running and available for me to connect to it (if that is indeed the cause of the NoValidConnections
exception)
start
when the VM is running and sshd
is available for me to connect to?Alternately I see paramiko
has a timeout
option in the connect
call - should I just change my 5 second sleep to a 5 second timeout?
Upvotes: 2
Views: 77
Reputation: 7737
There’s no way for GCE to know if the guest is SSH-able. (For instance, imagine a case where the guest uses a nonstandard method for allowing remote connections, so even checking sshd
wouldn’t work. Even if you could rely on sshd
, the way to check that it’s running depends on its version, host OS, configuration, etc.) GCE only knows hardware-level information about the VM, such as whether it rebooted.
To solve your problem, I would try the timeout mechanism in paramiko like you described, or maybe retry the connection attempt in a loop with a timeout since paramiko might not implement a full-state-reset retry internally (just speculating, I’m not sure).
Also, I think 5 seconds may be a little low — it’s probably fine for average response time, but outliers will be slower, which could cause your connection attempts to be flaky. Maybe bump that to 30 seconds or a minute just to be totally safe.
Upvotes: 2