Brett
Brett

Reputation: 6030

chef zero via AWS userdata fails

When running chef zero via AWS userdata, the run always fails. However, if I ssh onto the machine and manually execute the same commands, it works as expected. This is the output that I get:

Chef: 11.12.8
[2014-06-11T12:40:34+00:00] INFO: Auto-discovered chef repository at /opt/chef-zero
[2014-06-11T12:40:34+00:00] INFO: Starting chef-zero on port 8889 with repository at repository at /opt/chef-zero
  One version per cookbook

[2014-06-11T12:40:34+00:00] INFO: Forking chef instance to converge...
[2014-06-11T12:40:35+00:00] DEBUG: Fork successful. Waiting for new chef pid: 1530
[2014-06-11T12:40:35+00:00] DEBUG: Forked instance now converging
[2014-06-11T12:40:35+00:00] ERROR: undefined method `[]' for nil:NilClass
[2014-06-11T12:40:35+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

The userdata that I set when launching the EC2 instance in AWS includes the following:

curl -L https://www.opscode.com/chef/install.sh | bash
mkdir /opt/chef-zero
cd /opt/chef-zero
wget http://myserver/chef-repo.tar.gz
tar zxf chef-repo
INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
cat <<EOF > /opt/chef-zero/solo.rb
ssl_verify_mode :verify_peer
node_name "$INSTANCE_ID"
EOF
/opt/chef/bin/chef-client -v >chef-zero.log 2>&1
/opt/chef/bin/chef-client -z -l debug -c solo.rb -o 'role[someRole]' -E BUILD >> chef-zero.log 2>&1

The AMI that I'm using is a custom one that was initially provisioned using knife + knife-ec2 (that bootstrapped chef 11.6.0 from an ubuntu 13.04 public ami). The omnibus installer from userdata (curl ... | bash) is upgrading chef to 11.12.8. The original knife run included chef-client::service in it's run, and the host is initially configured for use with chef-client + chef-server (i.e. there's a "validation.pem" and "client.rb" in /etc/chef - not sure if that makes a difference).

I am able to log onto the machine and execute chef-client -z -c solo.rb -o 'role[someRole]' -E BUILD as soon as the machine comes up (after waiting for files to be retrieved and the user-data chef-client to fail) and the chef run executes normally.

I have no idea why the userdata chef-client run fails with undefined method, any ideas what's causing it?

Upvotes: 0

Views: 814

Answers (2)

Brett
Brett

Reputation: 6030

After some further investigation, and thanks to bit of chatting with the #chef guys on freenode, the problem was narrowed down to the environment.

When executing the script with userdata, the "HOME" variable is not set. shell.rb from the chef gem is littered with references to ENV["HOME"].

SSH:

# unset HOME
# chef-client -z -o 'role[test]'
ERROR: undefined method `[]' for nil:NilClass
# export HOME=/root
# chef-client -z -o 'role[test]'
Starting Chef Client, version ....
...
Chef Client finished, ...

If you need to execute chef-client via user data, you should manually export HOME before trying to execute chef.

Bug has been reported at https://tickets.opscode.com/browse/CHEF-5365

edit

Submitted a pull request which has since been merged into master. https://github.com/opscode/chef/pull/1494

Upvotes: 1

Julian Dunn
Julian Dunn

Reputation: 286

This likely has nothing to do with chef-zero but indicates a problem in your recipe code (whatever's inside that chef-repo.tar.gz, or is driven by role[someRole]). It indicates an attempt to access a sub-element of a hash like

node['foo']['bar']

but when node['foo'] is nil (undefined)

Check the stacktrace that's generated by the chef client run to narrow it down.

Upvotes: 0

Related Questions