Reputation: 2436
I'm trying to provision an EMR with a bootstrap action. I can see the stdout log and it finishes fine. The last action is install boto3.
Installing collected packages: jmespath, python-dateutil, botocore, s3transfer, boto3
Successfully installed boto3-1.18.28 botocore-1.21.28 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.0
However after that EMR fails with "On the master instance, application provisioning failed". See log below.
I think this might be due to what I install in the bootstrap. java 11, python 3.7 etc. However, If run the same script manually via SSH after EMR has been provisioned everything works fine. Is there any way to execute the bootstrap action after all applications have been installed?
Error log: from provision-node/apps-phase/0/60c849d6-ca64-486d-8b4a-4c60201b168f/
2021-08-25 15:01:07,025 ERROR main: Encountered a problem while provisioning
com.amazonaws.emr.node.provisioner.puppet.api.PuppetException: Unable to complete transaction and some changes were applied.
at com.amazonaws.emr.node.provisioner.puppet.api.ApplyCommand.handleExitcode(ApplyCommand.java:74)
at com.amazonaws.emr.node.provisioner.puppet.api.ApplyCommand.call(ApplyCommand.java:56)
at com.amazonaws.emr.node.provisioner.bigtop.BigtopPuppeteer.applyPuppet(BigtopPuppeteer.java:73)
at com.amazonaws.emr.node.provisioner.bigtop.BigtopDeployer.deploy(BigtopDeployer.java:22)
at com.amazonaws.emr.node.provisioner.NodeProvisioner.provision(NodeProvisioner.java:25)
at com.amazonaws.emr.node.provisioner.workflow.NodeProvisionerWorkflow.doWork(NodeProvisionerWorkflow.java:196)
at com.amazonaws.emr.node.provisioner.workflow.NodeProvisionerWorkflow.work(NodeProvisionerWorkflow.java:101)
at com.amazonaws.emr.node.provisioner.Program.main(Program.java:30)
Upvotes: 1
Views: 1281
Reputation: 803
There is a way to run post provisioning (second stage) bootstrapping actions on an EMR. It's a bit of a hack and it works like this.
As your last bootstrapping action you need to copy the file you want to run after the installation of stuff like Hadoop or Spark from S3 into the node and run it as a background action.
The background action will wait for the node to be fully provisioned before it actually runs the code you wanted to run in the first place before it exits the loop.
Here's the code:
set_up_post_provisioning.sh
#!/bin/bash -x
aws s3 cp s3://path/to/bootstrap/scripts/post_provisioning.sh /home/hadoop/post_provisioning.sh &&
sudo bash /home/hadoop/post_provisioning.sh &
exit 0
post_provisioning.sh
#!/bin/bash
while true
do
NODEPROVISIONSTATE=$(sed -n '/localInstance [{]/,/[}]/{
/nodeProvisionCheckinRecord [{]/,/[}]/ {
/status: / { p }
/[}]/a
}
/[}]/a
}' /emr/instance-controller/lib/info/job-flow-state.txt | awk '{ print $2 }')
if [[ "$NODEPROVISIONSTATE" == "SUCCESSFUL" ]]
then
sleep 10
echo "Your code here"
exit
fi
sleep 10
done
Make sure that only set_up_post_provisioning.sh
is an actual bootstrap action, since the next stage will not start if all the bootstrap actions are not finished.
I hope it helps!
Upvotes: 2