Joseph Pride
Joseph Pride

Reputation: 165

SnappyData: Connect Standalone Spark Job to Embedded Cluster

What I'm trying to achieve is similar to Smart Connector mode, but the documentation isn't helping me much, because the Smart Connector examples are based on Spark-Shell, whereas I'm trying to run a standalone Scala application. Therefore, I can't use the --conf arguments for Spark-Shell.

Trying to find my spark master, I looked into the SnappyData web interface. I found the following:

host-data="false"
 locators="xxx.xxx.xxx.xxx:10334"
 log-file="snappyleader.log"
 mcast-port="0"
 member-timeout="30000"
 persist-dd="false"
 route-query="false"
 server-groups="IMPLICIT_LEADER_SERVERGROUP"
 snappydata.embedded="true"
 spark.app.name="SnappyData"
 spark.closure.serializer="org.apache.spark.serializer.PooledKryoSerializer"
 spark.driver.host="xxx.xxx.xxx.xxx"
 spark.driver.port="37838"
 spark.executor.id="driver"
 spark.local.dir="/var/opt/snappydata/lead1/scratch"
 spark.master="snappydata://xxx.xxx.xxx.xxx:10334"
 spark.memory.manager="org.apache.spark.memory.SnappyUnifiedMemoryManager"
 spark.memory.storageFraction="0.5"
 spark.scheduler.mode="FAIR"
 spark.serializer="org.apache.spark.serializer.PooledKryoSerializer"
 spark.ui.port="5050"
 statistic-archive-file="snappyleader.gfs"
--- end --

(The IP addresses are all on one host, for now.)

I have a simple example Spark job, just to test getting my cluster working:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SnappySession
import org.apache.spark.sql.Dataset

object snappytest{
  case class Person(name: String, age: Long)

  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
      .builder()
      .appName("SnappyTest")
      .master("snappydata://xxx.xxx.xxx.xxx:10334")
      .getOrCreate()
    val snappy = new SnappySession(spark.sparkContext)

    import spark.implicits._

    val caseClassDS = Seq(Person("Andy", 35)).toDS()
    println(Person)
    println(snappy)
    println(spark)
  }
}

And I got this error:

17/10/25 14:44:57 INFO ServerConnector: Started Spark@ffaaaf0{HTTP/1.1}{0.0.0.0:4040}
17/10/25 14:44:57 INFO Server: Started @2743ms
17/10/25 14:44:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/10/25 14:44:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx.xxx.xxx.xxx:4040
17/10/25 14:44:57 INFO SnappyEmbeddedModeClusterManager: setting from url snappydata.store.locators with xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: cluster configuration after overriding certain properties 
jobserver.enabled=false
snappydata.embedded=true
snappydata.store.host-data=false
snappydata.store.locators=xxx.xxx.xxx.xxx:10334
snappydata.store.persist-dd=false
snappydata.store.server-groups=IMPLICIT_LEADER_SERVERGROUP
spark.app.name=SnappyTest
spark.driver.host=xxx.xxx.xxx.xxx
spark.driver.port=35602
spark.executor.id=driver
spark.master=snappydata://xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: passing store properties as {spark.driver.host=xxx.xxx.xxx.xxx, snappydata.embedded=true, spark.executor.id=driver, persist-dd=false, spark.app.name=SnappyTest, spark.driver.port=35602, spark.master=snappydata://xxx.xxx.xxx.xxx:10334, member-timeout=30000, host-data=false, default-startup-recovery-delay=120000, server-groups=IMPLICIT_LEADER_SERVERGROUP, locators=xxx.xxx.xxx.xxx:10334}
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
Exception in thread "main" org.apache.spark.SparkException: Primary Lead node (Spark Driver) is already running in the system. You may use smart connector mode to connect to SnappyData cluster.

So how do I (should I?) use smart connector mode in this case?

Upvotes: 2

Views: 296

Answers (1)

Yogesh Mahajan
Yogesh Mahajan

Reputation: 241

You need to specify following in your example spark job -

.master("local[*]")
.config("snappydata.connection", "xxx.xxx.xxx.xxx:1527")

Upvotes: 2

Related Questions