Reputation: 45124
What I'm planning to do is develop a reporting platform using existing data. I have an existing RDBMS which has large number of records. So I'm using. (Hadoop 2.7, Spark, Hive, JasperReports, Scoop - Architecuture)
Given that I have already read the following
Which mode should I use? Why? Decision is based on what?
Upvotes: 0
Views: 424
Reputation: 38910
Adding some more info to Danier Darabos answer : Apart from hosting application/faillover and where Driver runs ( Application Master in yarn-cluster mode or Client in yarn-client mode, other features remains same. But yarn-client mode supports spark-shell unlike yarn-cluster mode.
Have a look at this article to know the difference between running Spark application in various modes - YARN Cluster , YARN Client & Spark Stand alone modes
Take a calculated decision after considering criteria in all options.
Upvotes: 1
Reputation: 27455
The decision is about whether you want your application to run as a YARN application or not.
A non-YARN application (which you get in yarn-client
mode) is simpler. It's a classical Linux application, you can start it like any application and it runs on that machine like any application.
A YARN application (which you get in yarn-cluster
mode) is managed by YARN. It runs on whatever machine YARN decides to put it on. If it dies, YARN will restart it, perhaps on a different machine. It is more robust (e.g. it will get restarted if the machine dies) but at the cost of complexity (e.g. you don't have a fixed IP address for the application).
I'd go with yarn-client
at first. You can switch to yarn-cluster
later if you find you need the features it provides.
Upvotes: 1