Reputation: 1007
I'm running a python script using pyspark that connects to a Kubernetes cluster to run jobs using executor pods. The idea of the script is to create an SQLContext that queries a Snowflake database. However, I'm getting the following exception, but this exception is not descriptived enough
20/07/15 12:10:39 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
at net.snowflake.client.jdbc.internal.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
at net.snowflake.client.jdbc.internal.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:247)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
at net.snowflake.client.jdbc.ArrowResultChunk.readArrowStream(ArrowResultChunk.java:117)
at net.snowflake.client.core.SFArrowResultSet.buildFirstChunk(SFArrowResultSet.java:352)
at net.snowflake.client.core.SFArrowResultSet.<init>(SFArrowResultSet.java:230)
at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.getResultSet(SnowflakeResultSetSerializableV1.java:1079)
at net.snowflake.spark.snowflake.io.ResultIterator.liftedTree1$1(SnowflakeResultSetRDD.scala:85)
at net.snowflake.spark.snowflake.io.ResultIterator.<init>(SnowflakeResultSetRDD.scala:78)
at net.snowflake.spark.snowflake.io.SnowflakeResultSetRDD.compute(SnowflakeResultSetRDD.scala:41)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
have anyone run into a similar case? if so how did you fix it?
Upvotes: 0
Views: 1235
Reputation: 61
I ran into the same problem and was able to fix it. I figured out that io.netty.tryReflectionSetAccessible
needs to be explicitly set to true
in Java >=9 for Spark-Snowflake connector to be able to read data returned back from Snowflake, in Kubernetes executor pods.
Now, since io.netty
packages are shaded in snowflake-jdbc, we need to qualify the property with the full package name, i.e. net.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true
.
This property needs to be set as a JVM option of the Spark executor pod. This can be accomplished by setting the executor JVM Option or the executor extra JVM Option Spark property. For example:
Property Name:spark.executor.extraJavaOptions
Value: -Dnet.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true
Upvotes: 1
Reputation: 10362
Note: Below changes are done in local system. It may or may not work for you.
Posting steps how I have fixed issue.
I had similar issue where my brew
installed spark with openjdk@11 by default & To fix this issue I have changed java version from openjdk@11
to oracle jdk 1.8
(You can use open jdk 1.8 instead of oracle jdk 1.8)
> cat spark-submit
#!/bin/bash
JAVA_HOME="/root/.linuxbrew/opt/openjdk@11" exec "/root/.linuxbrew/Cellar/apache-spark/3.0.0/libexec/bin/pyspark" "$@"
Changed java version from openjdk@11 to oracle jdk 1.8. Now my spark-submit
command looks like below.
> cat spark-submit
#!/bin/bash
JAVA_HOME="/usr/share/jdk1.8.0_202" exec "/root/.linuxbrew/Cellar/apache-spark/3.0.0/libexec/bin/pyspark" "$@"
Another workaround to fix this issue, try setting below conf to your spark-submit
spark-submit \
--conf 'spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
--conf 'spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
...
Upvotes: 0