Connecting SparkR to the spark cluster

Question

I have a spark cluster running on 10 machines (1 - 10) with the master at machine 1. All of these run on CentOS 6.4.

I am trying to connect a jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), using sparkR, to the cluster and get the spark context.

The code I am using is

Sys.setenv(SPARK_HOME="/usr/local/spark-1.4.1-bin-hadoop2.4") 
library(SparkR)
sc <- sparkR.init(master="spark://:7077")

The output I get is

attaching package: ‘SparkR’
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
intersect, sample, table
Launching java with spark-submit command spark-submit sparkr-shell/tmp/Rtmpzo6esw/backend_port29e74b83c7b3 Error in sparkR.init(master = "spark://10.10.5.51:7077"): JVM is not ready after 10 seconds

Error in sparkRSQL.init(sc): object 'sc' not found

I am using Spark 1.4.1. The spark cluster is also running CDH 5.

The jupyterhub installation can connect to the cluster via pyspark and I have python notebooks which use pyspark.

Can someone tell me what I am doing wrong?

Connecting SparkR to the spark cluster

Answers (1)

Related Questions