Pari Margu
Pari Margu

Reputation: 209

SparkR job(R script) submit using spark-submit fails in BigInsights Hadoop cluster

I have created IBM BigInsights service with hadoop cluster of 5 nodes(including Apache Spark with SparkR). I trying to use SparkR to connect cloudant db and get some data and do some processing.

SparkR job(R script) submit using spark-submit fails in BigInsights Hadoop cluster. I have created SparkR script and ran the following code,

-bash-4.1$ spark-submit --master local[2] test_sparkr.R
16/08/07 17:43:40 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
Error: could not find function "sparkR.init"
Execution halted
-bash-4.1$

Content of test_sparkr.R file is:

# Creating SparkConext and connecting to Cloudant DB
sc <- sparkR.init(sparkEnv = list("cloudant.host"="<<cloudant-host-name>>","<<><<cloudant-user-name>>>","cloudant.password"="<<cloudant-password>>", "jsonstore.rdd.schemaSampleSize"="-1"))

# Database to be connected to extract the data
database <- "testdata"
# Creating Spark SQL Context
sqlContext <- sparkRSQL.init(sc)
# Creating DataFrame for the "testdata" Cloudant DB
testDataDF <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true')

How to install the spark-cloudant connector in IBM BigInsights and resolve the issue. Kindly do the needful. Help would be much appreciated.

Upvotes: 1

Views: 450

Answers (1)

JasonSmith
JasonSmith

Reputation: 73722

I believe that the spark-cloudant connector isn’t for R yet.

Hopefully I can update this answer when it is!

Upvotes: 0

Related Questions