Koenig Lear
Koenig Lear

Reputation: 2436

boostrap action EMR post application installation

I'm trying to provision an EMR with a bootstrap action. I can see the stdout log and it finishes fine. The last action is install boto3.

Installing collected packages: jmespath, python-dateutil, botocore, s3transfer, boto3
Successfully installed boto3-1.18.28 botocore-1.21.28 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.0

However after that EMR fails with "On the master instance, application provisioning failed". See log below.

I think this might be due to what I install in the bootstrap. java 11, python 3.7 etc. However, If run the same script manually via SSH after EMR has been provisioned everything works fine. Is there any way to execute the bootstrap action after all applications have been installed?

Error log: from provision-node/apps-phase/0/60c849d6-ca64-486d-8b4a-4c60201b168f/

2021-08-25 15:01:07,025 ERROR main: Encountered a problem while provisioning
com.amazonaws.emr.node.provisioner.puppet.api.PuppetException: Unable to complete transaction and some changes were applied.
    at com.amazonaws.emr.node.provisioner.puppet.api.ApplyCommand.handleExitcode(ApplyCommand.java:74)
    at com.amazonaws.emr.node.provisioner.puppet.api.ApplyCommand.call(ApplyCommand.java:56)
    at com.amazonaws.emr.node.provisioner.bigtop.BigtopPuppeteer.applyPuppet(BigtopPuppeteer.java:73)
    at com.amazonaws.emr.node.provisioner.bigtop.BigtopDeployer.deploy(BigtopDeployer.java:22)
    at com.amazonaws.emr.node.provisioner.NodeProvisioner.provision(NodeProvisioner.java:25)
    at com.amazonaws.emr.node.provisioner.workflow.NodeProvisionerWorkflow.doWork(NodeProvisionerWorkflow.java:196)
    at com.amazonaws.emr.node.provisioner.workflow.NodeProvisionerWorkflow.work(NodeProvisionerWorkflow.java:101)
    at com.amazonaws.emr.node.provisioner.Program.main(Program.java:30)

Upvotes: 1

Views: 1281

Answers (1)

A Campos
A Campos

Reputation: 803

There is a way to run post provisioning (second stage) bootstrapping actions on an EMR. It's a bit of a hack and it works like this.

As your last bootstrapping action you need to copy the file you want to run after the installation of stuff like Hadoop or Spark from S3 into the node and run it as a background action.

The background action will wait for the node to be fully provisioned before it actually runs the code you wanted to run in the first place before it exits the loop.

Here's the code:

set_up_post_provisioning.sh

#!/bin/bash -x

aws s3 cp s3://path/to/bootstrap/scripts/post_provisioning.sh /home/hadoop/post_provisioning.sh &&
sudo bash /home/hadoop/post_provisioning.sh &
exit 0

post_provisioning.sh

#!/bin/bash

while true
do
  NODEPROVISIONSTATE=$(sed -n '/localInstance [{]/,/[}]/{
      /nodeProvisionCheckinRecord [{]/,/[}]/ {
      /status: / { p }
      /[}]/a
      }
      /[}]/a
    }' /emr/instance-controller/lib/info/job-flow-state.txt | awk '{ print $2 }')

  if [[ "$NODEPROVISIONSTATE" == "SUCCESSFUL" ]]
  then
    sleep 10

    echo "Your code here"

    exit
  fi

  sleep 10
done

Make sure that only set_up_post_provisioning.sh is an actual bootstrap action, since the next stage will not start if all the bootstrap actions are not finished.

I hope it helps!

Upvotes: 2

Related Questions