Reputation: 2179
I have my dependency jdbc driver for spark in s3, I am trying to load this in to spark lib folder immediately when the cluster is ready, so created the below step in my shell script before the spark-submit job,
--steps "[{\"Args\":[\"/usr/bin/hdfs\",\"dfs\",\"-get\",
\"s3://xxxx/jarfiles/sqljdbc4.jar\",
\"/usr/lib/spark/jars/\"],
\"Type\":\"CUSTOM_JAR\",
\"ActionOnFailure\":\"$STEP_FAILURE_ACTION\",
\"Jar\":\"s3://elasticmapreduce/libs/script-runner/script-runner.jar\",
\"Properties\":\"\",
\"Name\":\"Custom JAR\"},
{\"Args\":[\"spark-submit\",
\"--deploy-mode\", \"cluster\",
\"--class\", \"dataload.data_download\",
\"/home/hadoop/data_to_s3-assembly-0.1.jar\"],
\"Type\":\"CUSTOM_JAR\",
\"ActionOnFailure\":\"$STEP_FAILURE_ACTION\",
\"Jar\":\"s3://xxxx.elasticmapreduce/libs/script-runner/script-runner.jar\",
\"Properties\":\"\",
\"Name\":\"Data_Download_App\"}]"
But keep getting permission denied error at dfs -get step, I tried to provide "sudo /usr/bin/hdfs\"
, but then getting no such file as "sudo /usr/bin/hdfs\"
. How do I use sudo here? Or is there any other method to copy file from s3 to spark lib folder as part of step. I tried to do this in bootstrap, however, during bootstrap action, no spark folder is created, so that fails as well. Thanks.
Upvotes: 2
Views: 1832
Reputation: 2179
Updating the answer here for anyone who is looking for the same. I ended up doing it in a shell script where I am copying the jars to spark/jars folder.
Steps = [{
'Name': 'copy spark jars to the spark folder',
'ActionOnFailure': 'CANCEL_AND_WAIT',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ['sudo', 'bash', '/home/hadoop/reqd_files_setup.sh', self.script_bucket_name]
}
}]
Script in the shell script,
sudo aws s3 cp s3://bucketname/ /usr/lib/spark/jars/ --recursive --exclude "*" --include "*.jar"
Upvotes: 3