Reputation: 81
I am writing a python app to load data from SAP HANA.
dfr = DataFrameReader(sqlContext)
df = dfr.jdbc(url='jdbc:sap://ip_hana:30015/?user=<user>&password=<pwd>',table=table)
df.show()
It throws an error saying:
y4j.protocol.Py4JJavaError: An error occurred while calling o59.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: com.sap.db.jdbc.topology.Host
Serialization stack:
- object not serializable (class: com.sap.db.jdbc.topology.Host, value: <ip>:30015)
- writeObject data (class: java.util.ArrayList)
- object (class java.util.ArrayList, [])
- writeObject data (class: java.util.Hashtable)
- field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1, name: properties$1, type: class java.util.Properties)
- object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1, <function0>)
- field (class: org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, name: org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$getConnection, type: interface scala.Function0)
- object (class org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, JDBCRDD[5] at showString at NativeMethodAccessorImpl.java:-2)
- field (class: org.apache.spark.NarrowDependency, name: _rdd, type: class org.apache.spark.rdd.RDD)
- object (class org.apache.spark.OneToOneDependency, org.apache.spark.OneToOneDependency@57931c92)
- writeObject data (class: scala.collection.immutable.$colon$colon)
- object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.OneToOneDependency@57931c92))
- field (class: org.apache.spark.rdd.RDD, name: org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)
- object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:-2)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (MapPartitionsRDD[6] at showString at NativeMethodAccessorImpl.java:-2,<function2>))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:865)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:772)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:757)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1466)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
How do we resolve this?
Upvotes: 2
Views: 2432