Reputation: 7138
I understand the major differences between client and cluster mode for Spark applications on YARN.
Major differences include
In both cases for similarities
My question is - In real world scenarios( production environment), where we do not need interactive mode, client not requiring to run for long duration - is the cluster mode an obvious choice?
Are there any benefits for client mode like:
Upvotes: 3
Views: 656
Reputation: 7138
From the documentation,
A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within the client spark-submit process, with the input and output of the application attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).
Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications.
Looks like, the main reason is when we run the spark-submit from remote, to reduce the latency between executors and driver, cluster mode is preferred.
Upvotes: 2
Reputation: 13926
From my experience, in production environment the only resonable mode is cluster-mode with 2 exceptions:
ssh
to server that is not accessible from hadoop nodesssc.stop(stopGracefully = true)
Upvotes: 1