TheM00s3
TheM00s3

Reputation: 3711

spark-submit failing when jar is on s3

I've got a docker that's been setup as follows on an EC2 server:

docker run --name master  -d -p 7077:7077 -e AWS_ACCESS_KEY_ID='MY_ID' -e AWS_SECRET_ACCESS_KEY='MY_SECRET_KEY' gettyimages/spark

I run the spark-submit process with the following command.

docker exec -it master bin/spark-submit --master spark://0.0.0.0:7077 --verbose --class my/class s3://myBucket/path

Here is the printout from the run:

Warning: Skip remote jar s3://myBucket/MyBin.
java.lang.ClassNotFoundException: my/class
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Upvotes: 1

Views: 1445

Answers (1)

stevel
stevel

Reputation: 13490

This is one of those things that a copy of the source code and IDE that scans through it helps...a quick grep shows up that it only supports file:/ and local:/ URLs.

AFAIK the application JAR must always be local, though anything listed with --jars will, if visible inside the spark cluster itself, be picked up and added to the CP of the work itself.

Upvotes: 1

Related Questions