Spark Submit in Cluster mode failing when main method is doing nothing

Question

I have implemented some logic in spark. Execution of this logic depends on the parameter passed to the Java code. In my main method, I have a switch case. When I pass a correct parameter, everything works fine. When I pass some random parameter which is not in my switch case I am getting Spark Exception. Finished with failed status.

If I run the same code in Client mode, it is not throwing any exception with incorrect parameter which I guess is the correct behavior.

I am creating context only when my parameter is correct. So basically if I give an empty main method in Cluster mode, I am getting exception.

Can someone explains me how this works. How can I avoid this exception.

public class MyClass{
private JavaSparkContext context = null;
private HiveContext hiveContext = null;
public static void main(String[] args) {


    MyClass obj = new MyClass();
    obj.startProcessing(args);      
}

startProcessing method simple contains a switch case.

Thanks

Stacktrace:

  Exception in thread "main" org.apache.spark.SparkException: Application application_1466638963111_3824 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1036)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1083)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Andrei Stankevich · Accepted Answer

When you run application in 'client' mode, script that you use to run your application (which is i think bin/spark-submit) directly launch your Main class. Your Main class has to create Spark context and that Spark context will connect to Spark cluster or cluster manager if you use one like Mesos or Yarn. In this case if you don't create Spark context it is fine. Your Main class just do nothing and exit.

In cluster mode Spark doesn't run your Main class directly, it creates a client and it creates submit request and submits this request to cluster. This submit request has your Main class name as a parameter. When cluster gets this request it launch your Main class. Client at this time is waiting for response from cluster that Spark context was created successfully and client has to get information about created context.

But if you don't create a context client can not connect to it and cliend doesn't get a responce that context was successfully created and throws an exception.

You can also add

--verbose

flag to your launch script and you will see that in client mode it launches your Main class directly and in cluster mode it launches different Main class.

Spark Submit in Cluster mode failing when main method is doing nothing

Answers (1)

Related Questions