user3612324
user3612324

Reputation: 57

Connecting SparkR to the spark cluster

I have a spark cluster running on 10 machines (1 - 10) with the master at machine 1. All of these run on CentOS 6.4.

I am trying to connect a jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), using sparkR, to the cluster and get the spark context.

The code I am using is

Sys.setenv(SPARK_HOME="/usr/local/spark-1.4.1-bin-hadoop2.4") 
library(SparkR)
sc <- sparkR.init(master="spark://<master-ip>:7077")

The output I get is

attaching package: ‘SparkR’
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
intersect, sample, table
Launching java with spark-submit command spark-submit sparkr-shell/tmp/Rtmpzo6esw/backend_port29e74b83c7b3 Error in sparkR.init(master = "spark://10.10.5.51:7077"): JVM is not ready after 10 seconds

Error in sparkRSQL.init(sc): object 'sc' not found

I am using Spark 1.4.1. The spark cluster is also running CDH 5.

The jupyterhub installation can connect to the cluster via pyspark and I have python notebooks which use pyspark.

Can someone tell me what I am doing wrong?

Upvotes: 0

Views: 1753

Answers (1)

mnm
mnm

Reputation: 2022

I have a similar problem and have searching all around but no solutions. Can you please tell me what do you mean by "jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), "?

We have 4 clusters too on CentOS 6.4. One of my other problem is that how do use an IDE like IPython or RStudio to interact with these 4 servers? Do I use my laptop to connect to these servers remotely (if yes, then how?) and if no then what can be the other solution.

Now to answer your question, I can give it a try. I think the you have to use --yarn-cluster option as stated here I hope this helps you solving the problem.

Cheers, Ashish

Upvotes: 0

Related Questions