Reputation: 1567
I'm trying to launch a standalone Spark cluster using its pre-packaged EC2 scripts, but it just indefinitely hangs in an 'ssh-ready' state:
ubuntu@machine:~/spark-1.2.0-bin-hadoop2.4$ ./ec2/spark-ec2 -k <key-pair> -i <identity-file>.pem -r us-west-2 -s 3 launch test
Setting up security groups...
Searching for existing cluster test...
Spark AMI: ami-ae6e0d9e
Launching instances...
Launched 3 slaves in us-west-2c, regid = r-b_______6
Launched master in us-west-2c, regid = r-0______0
Waiting for all instances in cluster to enter 'ssh-ready' state..........
Yet I can SSH into these instances without complaint:
ubuntu@machine:~$ ssh -i <identity-file>.pem root@master-ip
Last login: Day MMM DD HH:mm:ss 20YY from c-AA-BBB-CCCC-DDD.eee1.ff.provider.net
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2013.03-release-notes/
There are 59 security update(s) out of 257 total update(s) available
Run "sudo yum update" to apply all updates.
Amazon Linux version 2014.09 is available.
root@ip-internal ~]$
I'm trying to figure out if this is a problem in AWS or with the Spark scripts. I've never had this issue before until recently.
Upvotes: 5
Views: 3461
Reputation: 6940
This issue is fixed in Spark 1.3.0.
Your problem is caused by SSH silently stopping because of conflicting entries in you SSHs known_hosts
file.
To resolve your issue add -o UserKnownHostsFile=/dev/null
to your spark_ec2.py
script like this.
Optionally, to clean up and avoid running into problems with connecting to your cluster with SSH later on I recommend you to:
~/.ssh/known_hosts
that include EC2 hosts, for example:ec2-54-154-27-180.eu-west-1.compute.amazonaws.com,54.154.27.180 ssh-rsa (...)
Upvotes: 4
Reputation: 1567
I used the absolute (not relative) path to my identity file (inspired by Peter Zybrick) and did everything Grzegorz Dubicki suggested. Thank you.
Upvotes: 1
Reputation: 21
I had the same problem and I followed all the steps mentioned in the thread (mainly adding -o UserKnownHostsFile=/dev/null to your spark_ec2.py script), still it was hanging saying
Waiting for all instances in cluster to enter 'ssh-ready' state
Change permission of the private key file and rerun the spark-ec2 script
[spar@673d356d]/tmp/spark-1.2.1-bin-hadoop2.4/ec2% chmod 0400 /tmp/mykey.pem
To troubleshoot, I modified spark_ec2.py and logged the the ssh command used and tried to execute it on command prompt, it was the bad permission on the key:
[spar@673d356d]/tmp/spark-1.2.1-bin-hadoop2.4/ec2% ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /tmp/mykey.pem -o ConnectTimeout=3 [email protected]
Warning: Permanently added '52.1.208.72' (RSA) to the list of known hosts.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for '/tmp/mykey.pem' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /tmp/mykey.pem
Permission denied (publickey).
Upvotes: 2
Reputation: 19
I just ran into the same exact situation. I went into the python script at def is_ssh_available()
and had it dump out the return code and cmd.
except subprocess.CalledProcessError, e:
print "CalledProcessError "
print e.returncode
print e.cmd
I had the key file location as ~/.pzkeys/mykey.pem
- as an experiment, I changed it to fully qualified - i.e. /home/pete.zybrick/.pzkeys/mykey.pem
and that worked ok.
Right after that, I ran into another error - I tried to use --user=ec2-user
(I try to avoid using root), then I got a permission error on rsync, removed the --user-ec2-user
so it would use root as default, did another attempt with --resume
, ran to successful completion.
Upvotes: 1