Reputation: 539
I have been developing in pyspark with spark standalone non-cluster mode. These days, I would like to explore more on the cluster mode of spark. I searched on the internet, and found I may need a cluster manager to run clusters in different machines using Apache Mesos or Spark Standalone. But, I couldn't easily find details of the picture.
How should I set up from system design point of view in order to run spark clusters in multiple windows machines (or multiple windows vms).
Upvotes: 5
Views: 5920
Reputation: 74759
You may want to explore (from the simplest) Spark Standalone, through Hadoop YARN to Apache Mesos or DC/OS. See Cluster Mode Overview.
I'd recommend using Spark Standalone first (as the easiest option to submit Spark applications to). Spark Standalone is included in any Spark installation and works fine on Windows. The issue is that there are no scripts to start and stop the standalone Master and Workers (aka slaves) for Windows OS. You simply have to "code" them yourself.
Use the following to start a standalone Master on Windows:
// terminal 1
bin\spark-class org.apache.spark.deploy.master.Master
Please note that after you start the standalone Master you get no input, but don't worry and head over to http://localhost:8080/ to see the web UI of the Spark Standalone cluster.
In a separate terminal start an instance of the standalone Worker.
// terminal 2
bin\spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077
With one-worker Spark Standalone cluster up, you should be able to submit Spark applications as follows:
spark-submit --master spark://localhost:7077 ...
Read Spark Standalone Mode in the official documentation of Spark.
As I just found out Mesos is not an option given its System Requirements:
Mesos runs on Linux (64 Bit) and Mac OS X (64 Bit).
You could however run any of the clusters using virtual machines using VirtualBox or similar. At least DC/OS has dcos-vagrant that should make it fairly easy:
dcos-vagrant Quickly provision a DC/OS cluster on a local machine for development, testing, or demonstration.
Deploying DC/OS Vagrant involves creating a local cluster of VirtualBox VMs using the dcos-vagrant-box base image and then installing DC/OS.
Upvotes: 20