Naresh G
Naresh G

Reputation: 83

Springboot with Spark in Yarn mode

I am trying to run living spark session using spring boot. My aim is to run spark in Yarn mode with springboot.

  1. I would like to have only only one jar file as artifact and do not want to separate spark dependencies
  2. Apart from below code do I need to add any configuration? When I am trying it always try to connect to localhost instead of actual host. (RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 20/01/23 20:14:14 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032)
  3. Is there any separate configuration is required to log worker logs along with driver logs?
SparkConf conf = new SparkConf().
 set("spark.driver.extraJavaOptions", "Dlog4j.configuration=file://src/main/resources/log4j.properties").
 set("spark.executor.extraJavaOptions","Dlog4j.configuration=file://src/main/resources/log4j.properties").
 set("yarn.resoursemanager.address","http://my-yarn-host").
 set("spark.yarn.jars","BOOT-INF/lib/spark-*.jar").
 setAppName("NG-Workbench").setMaster("yarn");

JavaSparkContext sc = new JavaSparkContext(conf);
List<String> word = new ArrayList<>();
word.add("Sidd");
JavaRDD<String> words = sc.parallelize(Arrays.asList("Michel", "Steve"));
Map<String, Long> wordCounts = words.countByValue();
wordCounts.forEach((k, v) -> System.out.println(k + " " + v));
sc.close();

Upvotes: 2

Views: 735

Answers (1)

maxime G
maxime G

Reputation: 1771

i would suggest you to add some configuations files to your artifact:

  • yarn-site.xml
  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml

otherwise you can add these 2 properties in your spark-conf :

  • spark.hadoop.yarn.resourcemanager.address : "XXXX:8050"
  • spark.hadoop.yarn.resourcemanager.hostname: "XXXX"

Upvotes: 0

Related Questions