Sampat Kumar
Sampat Kumar

Reputation: 502

Failed to Read data from Couchbase using Spark

I have been trying to read data from couchbase , but failing to read due to authentication issue.

import com.couchbase.client.java.document.JsonDocument
import org.apache.spark.sql.SparkSession
import com.couchbase.spark._

object SparkRead {



def main(args: Array[String]): Unit = {

  // The SparkSession is the main entry point into spark
  val spark = SparkSession
    .builder()
    .appName("KeyValueExample")
    .master("local[*]") // use the JVM as the master, great for testing
    .config("spark.couchbase.nodes", "***********") // connect to couchbase on hostname
    .config("spark.couchbase.bucket.beer-sample","") // open the travel-sample bucket with empty password
    .config("spark.couchbase.username", "couchdb")
    .config("spark.couchbase.password", "******")
    .config("spark.couchbase.connectTimeout","30000")
    .config("spark.couchbase.kvTimeout","10000")
    .config("spark.couchbase.socketConnect","10000")
    .getOrCreate()

  spark.sparkContext
    .couchbaseGet[com.couchbase.client.java.document.JsonDocument](Seq("airline_10123")) // Load documents from couchbase
    .collect() // collect all data from the spark workers
    .foreach(println) // print each document content


 }
}

Below is the Build File

name := "KafkaSparkCouchReadWrite"

organization := "my.clairvoyant"

version := "1.0.0-SNAPSHOT"

scalaVersion := "2.11.11"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.1.0",
  "org.apache.spark" %% "spark-streaming" % "2.1.0",
  "org.apache.spark" %% "spark-sql" % "2.1.0",
  "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0",
  "com.couchbase.client" %% "spark-connector" % "2.1.0",
  "org.glassfish.hk2" % "hk2-utils" % "2.2.0-b27",
  "org.glassfish.hk2" % "hk2-locator" % "2.2.0-b27",
  "javax.validation" % "validation-api" % "1.1.0.Final",
  "org.apache.kafka" %% "kafka" % "0.11.0.0",
  "com.googlecode.json-simple" % "json-simple" % "1.1").map(_.excludeAll(ExclusionRule("org.glassfish.hk2"),ExclusionRule("javax.validation")))

ERROR LOG

17/12/12 15:18:35 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.33.220, 52402, None) 17/12/12 15:18:35 INFO SharedState: Warehouse path is 'file:/Users/sampat/Desktop/GitClairvoyant/cpdl3-poc/KafkaSparkCouchReadWrite/spark-warehouse/'. 17/12/12 15:18:35 INFO CouchbaseCore: CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword=false, sslKeystore=null, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=8, computationPoolSize=8, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=12, queryServiceEndpoints=12, searchServiceEndpoints=12, ioPool=NioEventLoopGroup, kvIoPool=null, viewIoPool=null, searchIoPool=null, queryIoPool=null, coreScheduler=CoreScheduler, memcachedHashingStrategy=DefaultMemcachedHashingStrategy, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.4.2 (git: 2.4.2, core: 1.4.2), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, dcpConnectionName=dcp/core-io, callbacksOnIoPool=false, disconnectTimeout=25000, requestBufferWaitStrategy=com.couchbase.client.core.env.DefaultCoreEnvironment$2@7b7b3edb, queryTimeout=75000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, dnsSrvEnabled=false} 17/12/12 15:18:37 WARN Endpoint: [null][KeyValueEndpoint]: Authentication Failure. 17/12/12 15:18:37 INFO Endpoint: [null][KeyValueEndpoint]: Got notified from Channel as inactive, attempting reconnect. 17/12/12 15:18:37 WARN ResponseStatusConverter: Unknown ResponseStatus with Protocol HTTP: 401 17/12/12 15:18:37 WARN ResponseStatusConverter: Unknown ResponseStatus with Protocol HTTP: 401 Exception in thread "main" com.couchbase.client.java.error.InvalidPasswordException: Passwords for bucket "beer-sample" do not match. at com.couchbase.client.java.CouchbaseAsyncCluster$OpenBucketErrorHandler.call(CouchbaseAsyncCluster.java:601) at com.couchbase.client.java.CouchbaseAsyncCluster$OpenBucketErrorHandler.call(CouchbaseAsyncCluster.java:584) at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140) at rx.internal.operators.OnSubscribeMap$MapSubscriber.onError(OnSubscribeMap.java:88)

Upvotes: 0

Views: 980

Answers (2)

mahendra singh
mahendra singh

Reputation: 34

You will need to set the following couchbase configurations as system properties:

System.setProperty("com.couchbase.connectTimeout", "30000");
System.setProperty("com.couchbase.kvTimeout", "10000");
System.setProperty("com.couchbase.socketConnect", "10000");

Upvotes: 0

Sampat Kumar
Sampat Kumar

Reputation: 502

Sample Code :

import com.couchbase.client.java.document.JsonDocument
import com.couchbase.spark._
import org.apache.spark.sql.SparkSession

object SparkReadCouchBase {

def main(args: Array[String]): Unit = {

val spark = SparkSession
  .builder()
  .appName("KeyValueExample")
  .master("local[*]") // use the JVM as the master, great for testing
  .config("spark.couchbase.nodes", "127.0.0.1") // connect to couchbase on hostname
  .config("spark.couchbase.bucket.travel-sample","") // open the travel-sample bucket with empty password
  .config("com.couchbase.username", "*******")
  .config("com.couchbase.password", "*******")
  .config("com.couchbase.kvTimeout","10000")
  .config("com.couchbase.connectTimeout","30000")
  .config("com.couchbase.socketConnect","10000")
  .getOrCreate()
println("=====================================================================================")
spark.sparkContext
  .couchbaseGet[JsonDocument](Seq("airline_10123")) // Load documents from couchbase
  .collect() // collect all data from the spark workers
  .foreach(println) // print each document content
println("=====================================================================================")


  }
}

POM.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
  http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.*******.*****</groupId>
<artifactId>KafkaSparkCouch</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<properties>
    <java.version>1.8</java.version>
    <spark.version>2.2.0</spark.version>
    <scala.version>2.11.8</scala.version>
    <scala.parent.version>2.11</scala.parent.version>
    <kafka.client.version>0.11.0.0</kafka.client.version>
    <fat.jar.name>SparkCouch</fat.jar.name>
    <scala.binary.version>2.11</scala.binary.version>
    <main.class>com.*******.demo.spark.couchbase.SparkReadCouchBaseTest</main.class>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.parent.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_${scala.parent.version}</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_${scala.parent.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.parent.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>com.couchbase.client</groupId>
        <artifactId>spark-connector_${scala.parent.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>${kafka.client.version}</version>
    </dependency>

    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1.1</version>
    </dependency>

    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>

</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>${java.version}</source>
                <target>${java.version}</target>
            </configuration>
        </plugin>
        <!--Create fat-jar file-->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.4</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
            <version>2.15.2</version>
            <executions>
                <execution>
                    <id>compile</id>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                    <phase>compile</phase>
                </execution>

                <execution>
                    <id>test-compile</id>
                    <goals>
                        <goal>testCompile</goal>
                    </goals>
                    <phase>test-compile</phase>
                </execution>

                <execution>
                    <phase>process-resources</phase>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>net.alchim31.maven</groupId>
            <artifactId>scala-maven-plugin</artifactId>
            <version>3.1.6</version>
            <configuration>
                <scalaCompatVersion>${scala.binary.version}</scalaCompatVersion>
                <scalaVersion>${scala.version}</scalaVersion>
            </configuration>
            <!-- other settings-->
        </plugin>
    </plugins>
</build>
<repositories>
    <repository>
        <id>mavencentral</id>
        <url>http://repo1.maven.org/maven2/</url>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
    <repository>
        <id>scala</id>
        <name>Scala Tools</name>
        <url>http://scala-tools.org/repo-releases/</url>
        <releases>
            <enabled>true</enabled>
        </releases>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

<pluginRepositories>
    <pluginRepository>
        <id>scala</id>
        <name>Scala Tools</name>
        <url>http://scala-tools.org/repo-releases/</url>
        <releases>
            <enabled>true</enabled>
        </releases>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </pluginRepository>
</pluginRepositories>
<name>KafkaSparkCouch</name>

Upvotes: 0

Related Questions