Ged
Ged

Reputation: 18108

Why does Some(null) throw NullPointerException in Spark 2.4 (but worked in 2.2)?

This code has worked in the past under Spark 2.2 Scala 2.11.x, but does not in Spark 2.4.

val df = Seq(
  (1, Some("a"), Some(1)),
  (2, Some(null), Some(2)),
  (3, Some("c"), Some(3)),
  (4, None, None)
).toDF("c1", "c2", "c3")

I ran it in Spark 2.4 and it now gives the error:

scala> spark.version
res0: String = 2.4.0

scala> :pa
// Entering paste mode (ctrl-D to finish)

val df = Seq(
  (1, Some("a"), Some(1)),
  (2, Some(null), Some(2)),
  (3, Some("c"), Some(3)),
  (4, None, None)
).toDF("c1", "c2", "c3")

// Exiting paste mode, now interpreting.

java.lang.RuntimeException: Error while encoding: java.lang.NullPointerException
assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._1 AS _1#6
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, unwrapoption(ObjectType(class java.lang.String), assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2), true, false) AS _2#7
unwrapoption(IntegerType, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._3) AS _3#8
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:293)
  at org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:472)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
  at scala.collection.immutable.List.foreach(List.scala:388)
  at scala.collection.TraversableLike.map(TraversableLike.scala:233)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
  at scala.collection.immutable.List.map(List.scala:294)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:472)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377)
  at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228)
  ... 57 elided
Caused by: java.lang.NullPointerException
  at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289)
  ... 66 more

I'm curious what has changed and why replacing the line:

(2, Some(null), Some(2)),

with:

(2, None, Some(2)),

resolves the issue.

What has changed and does it mean for existing code base?

Upvotes: 1

Views: 5269

Answers (1)

Ged
Ged

Reputation: 18108

Considered a bug and reported as SPARK-26984.

Upvotes: 3

Related Questions