Reputation: 41
Consider the following case class schema,
case class Y (a: String, b: String)
case class X (dummy: String, b: Y)
The field b is optional, some of my data sets don't have field b. When I try to read a JSON string that doesn't contain I receive a field missing exception.
spark.read.json(Seq("{'dummy': '1', 'b': {'a': '1'}}").toDS).as[X]
org.apache.spark.sql.AnalysisException: No such struct field b in a;
at org.apache.spark.sql.catalyst.expressions.ExtractValue$.findField(complexTypeExtractors.scala:85)
at org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:53)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$resolveExpression$1.applyOrElse(Analyzer.scala:1074)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$resolveExpression$1.applyOrElse(Analyzer.scala:1065)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
How do I automatically deserialize fields that aren't present in the JSON to be as null ?
Upvotes: 3
Views: 606
Reputation: 31540
Define b
field as Option
type and use encoders to create struct type schema.
.schema
option with the case class X
to create dataset!Example:
case class Y (a: String, b: Option[String] = None)
case class X (dummy: String, b: Y)
import org.apache.spark.sql.Encoders
val schema = Encoders.product[X].schema
spark.read.schema(schema).json(Seq("{'dummy': '1', 'b': {'a': '1'}}").toDS).as[X].show()
//+-----+----+
//|dummy| b|
//+-----+----+
//| 1|[1,]|
//+-----+----+
Select b column from struct type:
spark.read.schema(schema).json(Seq("{'dummy': '1', 'b': {'a': '1'}}").toDS).as[X].
select("b.b").show()
//+----+
//| b|
//+----+
//|null|
//+----+
PrintSchema:
spark.read.schema(schema).json(Seq("{'dummy': '1', 'b': {'a': '1'}}").toDS).as[X].printSchema
//root
//|-- dummy: string (nullable = true)
//|-- b: struct (nullable = true)
//| |-- a: string (nullable = true)
//| |-- b: string (nullable = true)
Upvotes: 4