Aleksandr Pavlenko
Aleksandr Pavlenko

Reputation: 21

Unable to create org.apache.spark.sql.Row with scala.None value since Spark 2.X

Since Spark 2.X unable to create org.apache.spark.sql.Row with scala.None value (it was possible for Spark 1.6.X)

Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string

Reproducible example:

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

spark.createDataFrame(
  sc.parallelize(Seq(Row(None))),
  StructType(Seq(StructField("v", StringType, true)))
).first

Gist: https://gist.github.com/AleksandrPavlenko/bef1c34458883730cc319b2e7378c8c6

Looks like it was changed in SPARK-15657 (not sure, still trying to prove it)

Upvotes: 2

Views: 811

Answers (1)

user7735024
user7735024

Reputation: 11

This is an expected behavior as described in SPARK-19056 (Row encoder should accept optional types):

This is intentional. Allowing Option in Row is never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please use Dataset for typed operation/custom objects

Upvotes: 1

Related Questions