Reputation: 341
I am not able to run either pyspark scripts or shell commands using command-runner.jar step on AWS EMR. At first attempted using a pyspark script following these instructions:
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "spark-submit s3://dev-emrworkshop-data-810526023897-us-west-1/files/spark-etl-02.py s3://dev-emrworkshop-data-810526023897-us-west-1/input s3://dev-emrworkshop-data-810526023897-us-west-1/output" (in directory "."): error=2, No such file or directory
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:140)
at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Cannot run program "spark-submit s3://dev-emrworkshop-data-810526023897-us-west-1/files/spark-etl-02.py s3://dev-emrworkshop-data-810526023897-us-west-1/input s3://dev-emrworkshop-data-810526023897-us-west-1/output" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:93)
... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 8 more
The script is indeed accessible in S3, but I cannot run shell commands either, the same error is returned:
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "echo $PWD" (in directory "."): error=2, No such file or directory
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:140)
at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Cannot run program "echo $PWD" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:93)
... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 8 more
Using EMR release emr-6.5.0.
Upvotes: 0
Views: 838
Reputation: 341
Was able to run the pyspark script successfully using AWS CLI:
aws emr add-steps \
--cluster-id j-22CH78RKK2T39 \
--steps 'Type=CUSTOM_JAR,Name="Run spark-submit using command-runner.jar",ActionOnFailure=CONTINUE,Jar=command-runner.jar,Args=[spark-submit,s3://<bucket>/files/spark-etl-02.py,s3://<bucket>/input,s3://<bucket>/output]'
It looks like I formatted the command arguments incorrectly. Each argument should be separated by a comma. This calls the pyspark script directly from S3 per these docs.
Upvotes: 1