pac
pac

Reputation: 501

How to run different Spark versions on each node in a cluster?

Can I have an apache Spark cluster where different nodes run different versions of Spark? For example, could I have a master which is Spark 2.2.0, one node that is 2.0.1, another that is 2.2.0 and another that is 1.6.3 or should all nodes have the same version of Spark?

Upvotes: 1

Views: 496

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

Can I have an apache Spark cluster where different nodes run different versions of Spark?

No. This is not possible.

The reason is that there is no notion of Spark installation. Spark is a library and as such is a dependency of an application that once submitted for execution will be deployed and executed on cluster nodes (at least one, i.e. the driver).

With that said, just the version of the Spark dependency of your application is exactly the version of Spark in use. To be precise, the version of spark-submit in use (unless you use so-called a uber-jar with the Spark dependency bundled).

Upvotes: 0

args
args

Reputation: 532

Usually when we want to install different versions of spark on the cluster, all the versions will be installed on all the nodes, spark execution depends on which spark-submit (spark 1.6 or spark 2.0 or spark 2.2) is used while running the script.

Lets say we have installed spark 1.6 on master node only, when we submit the job on the cluster, say master node is fully utilized , then yarn-resource manager will see which node is free to run the job, here yarn will not wait until master node gets some resources,yarn will submit the job to the node which has free resources. So, for this reason all versions of spark has to be installed on all nodes on the cluster.

Upvotes: 1

Related Questions