Sami Badawi
Sami Badawi

Reputation: 1032

How to run Spark standalone on Kubernetes?

I have a Scala application that is using Spark 2.1 in standalone mode. The application will run for 2 hours and finish. It should be run once a month.

I found several approaches to combine Spark and Kubernetes:

  1. Use Apache Spark Helm Chart: https://github.com/kubernetes/charts/tree/master/stable/spark
  2. There is a special branch of Spark for Kubernetes: https://github.com/apache-spark-on-k8s/spark
  3. Build my own Docker image of my application including the Spark binary: http://blog.madhukaraphatak.com/scaling-spark-with-kubernetes-part-5/ Code example: https://github.com/phatak-dev/kubernetes-spark

Most of the documentation describe how to run a Spark cluster on Kubernetes. What is the approach for running Spark standalone on Kubernetes?

Upvotes: 3

Views: 4405

Answers (3)

Jiri Kremser
Jiri Kremser

Reputation: 12847

Check my https://github.com/radanalyticsio/spark-operator

It deploys standalone spark on Kubernetes and OpenShift and supports also spark-on-k8s native scheduler. The default Spark version is 2.4.0

You can find the very quick start in the project's readme file, however here is a way to deploy the spark cluster using the operator:

# create operator
kubectl apply -f https://raw.githubusercontent.com/radanalyticsio/spark-operator/master/manifest/operator.yaml

# create cluster
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: SparkCluster
metadata:
  name: my-cluster
spec:
  worker:
    instances: "2"
EOF

Upvotes: 1

Sami Badawi
Sami Badawi

Reputation: 1032

I first tried the simplest idea: Approach 3:

Build my own Docker image of my application including the Spark binary: http://blog.madhukaraphatak.com/scaling-spark-with-kubernetes-part-5/

Code example: https://github.com/phatak-dev/kubernetes-spark

It worked well.

Upvotes: 1

Anirudh Ramanathan
Anirudh Ramanathan

Reputation: 46778

For standalone spark on Kubernetes, the two canonical samples that exist are:

  1. https://github.com/kubernetes/charts/tree/master/stable/spark
  2. https://github.com/kubernetes/examples/tree/master/staging/spark

These are currently running outdated versions of Spark, and require updating to 2.1 and soon 2.2. (PRs are welcome :)).

The https://github.com/apache-spark-on-k8s/spark branch is not for standalone mode, but aims to enable Spark to directly launch on Kubernetes clusters. It will eventually be merged into upstream spark. Documentation, if you wish to make use of it, is here.

As of now, if you want to use Spark 2.1, options are: either to compile your own image, or packaging your application with the spark distribution in apache-spark-on-k8s

Upvotes: 1

Related Questions