Reputation: 11
I am trying to load the data from Cloudant DB into a Python/Spark dataframe in Python and Spark environment in Watson Studio. I have followed the steps mentioned in this link and stuck in Procedure 3: Step 5. I already have a cloudant DB with the name 'twitterdb' and I am trying to load data from here.
Error screenshot
Upvotes: -1
Views: 330
Reputation: 2155
By looking at the error, i see that you must have installed incorrect Cloudant Connector as compare to the kind of Spark version available on Spark As Service from IBM Cloud. Spark As Service offers spark version 2.1.2.
Now from the tutorial, one of the step indicates to install Spark Cloudant Package.
pixiedust.installPackage("org.apache.bahir:spark-sql-cloudant_2.11:0")
which i think must be installing wrong version of spark cloudant connector as the error state it is trying to use.
/gpfs/global_fs01/sym_shared/YPProdSpark/user/s97c-0d96df4a6a0cd8-8754c7852bb5/data/libs/spark-sql-cloudant_2.11-2.2.1.jar
The right version to install/use would be https://mvnrepository.com/artifact/org.apache.bahir/spark-sql-cloudant_2.11/2.1.2
Now important part is that Spark Cloudant connector is already installed by default. /usr/local/src/dataconnector-cloudant-2.0/spark-2.0.0/libs/
You should uninstall your user-installed package using pixiedust.
pixiedust.packageManager.uninstallPackage("org.apache.bahir:spark-sql-cloudant_2.11:2.2.1")
Then restart the kernel and then use cloudant connector as describe to read from your cloudant database.
spark = SparkSession\
.builder\
.appName("Cloudant Spark SQL Example in Python using dataframes")\
.config("cloudant.host","ACCOUNT.cloudant.com")\
.config("cloudant.username", "USERNAME")\
.config("cloudant.password","PASSWORD")\
.config("jsonstore.rdd.partitions", 8)\
.getOrCreate()
# ***1. Loading dataframe from Cloudant db
df = spark.read.load("n_airportcodemapping", "org.apache.bahir.cloudant")
df.cache()
df.printSchema()
Ref:- https://github.com/apache/bahir/tree/master/sql-cloudant
Thanks, Charles.
Upvotes: 1