Reputation: 1042
Hello people and happy new year ;) !
I am bulding a lambda architecture with Apache Spark, HDFS and Elastichsearch.
In the following picture, here what I am trying to do:
So far, I have written the source code in java for my spark streaming and spark applications. I read in the spark documentation that spark can be run in a Mesos or YARN clutser. As indicated in the picture, I have already a hadoop cluster. Is it possible to run my spark streaming and spark application within the same hadoop cluster ? If yes, is there any particular configuration to do (for instance the number of nodes, RAM...). Or do I have to add a hadoop cluster specialy for spark streaming ?
I hope my explanation is clear.
Yassir
Upvotes: 2
Views: 929
Reputation: 18270
You need not build a separate cluster for running spark streaming.
Change the spark.master
property to yarn-client
or yarn-cluster
in conf/spark-defaults.conf
file. When specified so, the spark application submitted will be handled by the ApplicationMaster of YARN and will be executed by NodeManagers.
Additionally modify these properties of cores and memory to align Spark with Yarn.
In spark-defaults.conf
spark.executors.memory
spark.executors.cores
spark.executors.instances
In yarn-site.xml
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.resource.cpu-vcores
Else it could lead to either deadlock or improper resource utilization of the cluster.
Refer here for resource management of cluster when running Spark on Yarn.
Upvotes: 1
Reputation: 2333
It is possible. You submit your streaming and batch applications to same yarn cluster. But sharing of cluster resources between these two jobs could be a bit tricky(as per my understanding).
So I would suggest you to look at Spark Jobserver to submit your applications. The Spark-jobserver makes your life easier when you want to maintain multiple spark contexts. All the required configurations for both the applications will be at one place.
Upvotes: 1