Reputation: 209
I have created IBM BigInsights service with hadoop cluster of 5 nodes(including Apache Spark with SparkR). I trying to use SparkR to connect cloudant db and get some data and do some processing.
SparkR job(R script) submit using spark-submit fails in BigInsights Hadoop cluster. I have created SparkR script and ran the following code,
-bash-4.1$ spark-submit --master local[2] test_sparkr.R
16/08/07 17:43:40 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
Error: could not find function "sparkR.init"
Execution halted
-bash-4.1$
Content of test_sparkr.R file is:
# Creating SparkConext and connecting to Cloudant DB
sc <- sparkR.init(sparkEnv = list("cloudant.host"="<<cloudant-host-name>>","<<><<cloudant-user-name>>>","cloudant.password"="<<cloudant-password>>", "jsonstore.rdd.schemaSampleSize"="-1"))
# Database to be connected to extract the data
database <- "testdata"
# Creating Spark SQL Context
sqlContext <- sparkRSQL.init(sc)
# Creating DataFrame for the "testdata" Cloudant DB
testDataDF <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true')
How to install the spark-cloudant connector in IBM BigInsights and resolve the issue. Kindly do the needful. Help would be much appreciated.
Upvotes: 1
Views: 450
Reputation: 73722
I believe that the spark-cloudant connector isn’t for R yet.
Hopefully I can update this answer when it is!
Upvotes: 0