juanp_1982
juanp_1982

Reputation: 1007

got java.lang.UnsupportedOperationException on executor pod

I'm running a python script using pyspark that connects to a Kubernetes cluster to run jobs using executor pods. The idea of the script is to create an SQLContext that queries a Snowflake database. However, I'm getting the following exception, but this exception is not descriptived enough

20/07/15 12:10:39 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
    at net.snowflake.client.jdbc.internal.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
    at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
    at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
    at net.snowflake.client.jdbc.internal.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:247)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
    at net.snowflake.client.jdbc.ArrowResultChunk.readArrowStream(ArrowResultChunk.java:117)
    at net.snowflake.client.core.SFArrowResultSet.buildFirstChunk(SFArrowResultSet.java:352)
    at net.snowflake.client.core.SFArrowResultSet.<init>(SFArrowResultSet.java:230)
    at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.getResultSet(SnowflakeResultSetSerializableV1.java:1079)
    at net.snowflake.spark.snowflake.io.ResultIterator.liftedTree1$1(SnowflakeResultSetRDD.scala:85)
    at net.snowflake.spark.snowflake.io.ResultIterator.<init>(SnowflakeResultSetRDD.scala:78)
    at net.snowflake.spark.snowflake.io.SnowflakeResultSetRDD.compute(SnowflakeResultSetRDD.scala:41)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)

have anyone run into a similar case? if so how did you fix it?

Upvotes: 0

Views: 1235

Answers (2)

Dipaan
Dipaan

Reputation: 61

I ran into the same problem and was able to fix it. I figured out that io.netty.tryReflectionSetAccessible needs to be explicitly set to true in Java >=9 for Spark-Snowflake connector to be able to read data returned back from Snowflake, in Kubernetes executor pods.

Now, since io.netty packages are shaded in snowflake-jdbc, we need to qualify the property with the full package name, i.e. net.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true.

This property needs to be set as a JVM option of the Spark executor pod. This can be accomplished by setting the executor JVM Option or the executor extra JVM Option Spark property. For example:

Property Name: spark.executor.extraJavaOptions

Value: -Dnet.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true

Upvotes: 1

s.polam
s.polam

Reputation: 10362

Note: Below changes are done in local system. It may or may not work for you.

Posting steps how I have fixed issue.

I had similar issue where my brew installed spark with openjdk@11 by default & To fix this issue I have changed java version from openjdk@11 to oracle jdk 1.8 (You can use open jdk 1.8 instead of oracle jdk 1.8)

> cat spark-submit
#!/bin/bash
JAVA_HOME="/root/.linuxbrew/opt/openjdk@11" exec "/root/.linuxbrew/Cellar/apache-spark/3.0.0/libexec/bin/pyspark"  "$@"

Changed java version from openjdk@11 to oracle jdk 1.8. Now my spark-submit command looks like below.

> cat spark-submit
#!/bin/bash
JAVA_HOME="/usr/share/jdk1.8.0_202" exec "/root/.linuxbrew/Cellar/apache-spark/3.0.0/libexec/bin/pyspark"  "$@"

Another workaround to fix this issue, try setting below conf to your spark-submit

spark-submit \
--conf 'spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
--conf 'spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
...

Upvotes: 0

Related Questions