EMR step copy file from s3 to spark lib

Question

I have my dependency jdbc driver for spark in s3, I am trying to load this in to spark lib folder immediately when the cluster is ready, so created the below step in my shell script before the spark-submit job,

--steps "[{\"Args\":[\"/usr/bin/hdfs\",\"dfs\",\"-get\",
                 \"s3://xxxx/jarfiles/sqljdbc4.jar\",
                 \"/usr/lib/spark/jars/\"],
         \"Type\":\"CUSTOM_JAR\",
         \"ActionOnFailure\":\"$STEP_FAILURE_ACTION\",
         \"Jar\":\"s3://elasticmapreduce/libs/script-runner/script-runner.jar\",
         \"Properties\":\"\",
         \"Name\":\"Custom JAR\"},
         {\"Args\":[\"spark-submit\",
                 \"--deploy-mode\", \"cluster\",
                 \"--class\", \"dataload.data_download\",
                 \"/home/hadoop/data_to_s3-assembly-0.1.jar\"],
         \"Type\":\"CUSTOM_JAR\",
         \"ActionOnFailure\":\"$STEP_FAILURE_ACTION\",
         \"Jar\":\"s3://xxxx.elasticmapreduce/libs/script-runner/script-runner.jar\",
         \"Properties\":\"\",
         \"Name\":\"Data_Download_App\"}]"

But keep getting permission denied error at dfs -get step, I tried to provide "sudo /usr/bin/hdfs\" , but then getting no such file as "sudo /usr/bin/hdfs\" . How do I use sudo here? Or is there any other method to copy file from s3 to spark lib folder as part of step. I tried to do this in bootstrap, however, during bootstrap action, no spark folder is created, so that fails as well. Thanks.

ds_user · Accepted Answer

Updating the answer here for anyone who is looking for the same. I ended up doing it in a shell script where I am copying the jars to spark/jars folder.

Steps = [{
            'Name': 'copy spark jars to the spark folder',
            'ActionOnFailure': 'CANCEL_AND_WAIT',
            'HadoopJarStep': {
                'Jar': 'command-runner.jar',
                'Args': ['sudo', 'bash', '/home/hadoop/reqd_files_setup.sh', self.script_bucket_name]
            }
        }]

Script in the shell script,

sudo aws s3 cp s3://bucketname/ /usr/lib/spark/jars/ --recursive --exclude "*" --include "*.jar"

EMR step copy file from s3 to spark lib

Answers (1)

Related Questions