Dushyant
Dushyant

Reputation: 81

Connecting to HANA from Spark

I am writing a python app to load data from SAP HANA.

dfr = DataFrameReader(sqlContext)
df = dfr.jdbc(url='jdbc:sap://ip_hana:30015/?user=<user>&password=<pwd>',table=table)
df.show()

It throws an error saying:

y4j.protocol.Py4JJavaError: An error occurred while calling o59.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host
Serialization stack:
    - object not serializable (class: com.sap.db.jdbc.topology.Host, value: <ip>:30015)
    - writeObject data (class: java.util.ArrayList)
    - object (class java.util.ArrayList, [])
    - writeObject data (class: java.util.Hashtable)
    - field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1, name: properties$1, type: class java.util.Properties)
    - object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1, <function0>)
    - field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, name: org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$getConnection, type: interface scala.Function0)
    - object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, JDBCRDD[5] at showString at NativeMethodAccessorImpl.java:-2)
    - field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class org.apache.spark.rdd.RDD)
    - object (class org.apache.spark.OneToOneDependency, org.apache.spark.OneToOneDependency@57931c92)
    - writeObject data (class: scala.collection.immutable.$colon$colon)
    - object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.OneToOneDependency@57931c92))
    - field (class: org.apache.spark.rdd.RDD, name: org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)
    - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:-2)
    - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
    - object (class scala.Tuple2, (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:-2,<function2>))
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:865)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:772)
    at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:757)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1466)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
    at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
    at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)

How do we resolve this?

Upvotes: 2

Views: 2432

Answers (1)

gator2000
gator2000

Reputation: 31

You probably need to use a newer version of the Hana JDBC driver, as per this page.

Upvotes: 3

Related Questions