abk
abk

Reputation: 341

"No such file" error for EMR step using command-runner.jar

I am not able to run either pyspark scripts or shell commands using command-runner.jar step on AWS EMR. At first attempted using a pyspark script following these instructions:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "spark-submit s3://dev-emrworkshop-data-810526023897-us-west-1/files/spark-etl-02.py s3://dev-emrworkshop-data-810526023897-us-west-1/input s3://dev-emrworkshop-data-810526023897-us-west-1/output" (in directory "."): error=2, No such file or directory
        at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:140)
        at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Cannot run program "spark-submit s3://dev-emrworkshop-data-810526023897-us-west-1/files/spark-etl-02.py s3://dev-emrworkshop-data-810526023897-us-west-1/input s3://dev-emrworkshop-data-810526023897-us-west-1/output" (in directory "."): error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:93)
        ... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 8 more

The script is indeed accessible in S3, but I cannot run shell commands either, the same error is returned:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "echo $PWD" (in directory "."): error=2, No such file or directory
        at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:140)
        at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.io.IOException: Cannot run program "echo $PWD" (in directory "."): error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:93)
        ... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 8 more

Using EMR release emr-6.5.0.

Upvotes: 0

Views: 838

Answers (1)

abk
abk

Reputation: 341

Was able to run the pyspark script successfully using AWS CLI:

aws emr add-steps \
    --cluster-id j-22CH78RKK2T39 \
    --steps 'Type=CUSTOM_JAR,Name="Run spark-submit using command-runner.jar",ActionOnFailure=CONTINUE,Jar=command-runner.jar,Args=[spark-submit,s3://<bucket>/files/spark-etl-02.py,s3://<bucket>/input,s3://<bucket>/output]'

It looks like I formatted the command arguments incorrectly. Each argument should be separated by a comma. This calls the pyspark script directly from S3 per these docs.

Upvotes: 1

Related Questions