Reputation: 21
Since Spark 2.X unable to create org.apache.spark.sql.Row with scala.None value (it was possible for Spark 1.6.X)
Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string
Reproducible example:
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
spark.createDataFrame(
sc.parallelize(Seq(Row(None))),
StructType(Seq(StructField("v", StringType, true)))
).first
Gist: https://gist.github.com/AleksandrPavlenko/bef1c34458883730cc319b2e7378c8c6
Looks like it was changed in SPARK-15657 (not sure, still trying to prove it)
Upvotes: 2
Views: 811
Reputation: 11
This is an expected behavior as described in SPARK-19056 (Row encoder should accept optional types):
This is intentional. Allowing
Option
inRow
is never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please useDataset
for typed operation/custom objects
Upvotes: 1