Reputation: 309
I am creating a Spark Sql Application and I want to run it on remote spark cluster from my local machine with my IDE. I know that I should set some option when I create SparkConf Object, smth like this:
SparkConf conf = new SparkConf()
.setMaster("spark://SPARK-MASTER-ADDRESS:7077")
.set("spark.driver.host","my local IP Address")
.setJars(new String[]{"build\\libs\\spark-test-1.0-SNAPSHOT.jar"})
.setAppName("APP-NAME");
It's working from IDE and every thing is OK,
but my questions are:
1) Do I need to rebuild the jar file of my app and set it's path to setJars method, every time I change anything? I saw that in some Forums had been said: you will need to build the jar every time you change anything. but It looks a little hard to rebuild app's jar file every time. Is there a better way for that?
2) Why is it sometimes not necessary to use setJars method, although I run the program through IDE ? For Example, When I do not use lambda function in my code there is no need to setjars function. Just Assume I have a class of person that have two field: CustomerNo, AccountNo. When I use lamba function in my code like this (personDS is a dataset of person object):
personDS.filter(f -> f.getCustomerNo().equals("001")).show();
the following error occurs:
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
but when I don't use lamba function in my code like this:
personDS.filter(col("customerNo").equalTo(001)).show();
No Error Occurs. So, Why is this happend? Why I have to use setJars when I use lambda function? When I should use setJars and when not?
Upvotes: 0
Views: 541
Reputation: 727
So, here i am assuming you are not using spark-submit
facility and you are running spark program directly from your IDE.
Below is my answer to your first question:
1) Do I need to rebuild the jar file of my app, every time I change anything? - YES to deploy your changes you need to build jar each time you make change in code.I use maven for same.
for second question :
I think that whenever you do any kind of map operation using a lambda which is referring to methods/classes of your project, you need to supply them as an additional jar.
Upvotes: 2