Reputation: 18108
Why is that for a case class I can do
fieldn: Option[Int]
or
fieldn: Option[Integer]
but for StructType I must use?
StructField("fieldn", IntegerType, true),
Upvotes: 0
Views: 1681
Reputation: 7056
I understand why it seems inconsistent - the reason is convenience. It is more convenient to give Spark a case class
because they are really easy to work with in Scala.
Behind the scenes, Spark is taking the case class
you give it and using it to determine the schema for your DataFrame. This means that all Java/Scala types will be converted to Spark SQL's types behind the scenes. For example, for the following case class:
case class TestIntConversion(javaInteger: java.lang.Integer, scalaInt: scala.Int, scalaOptionalInt: Option[scala.Int])
You get a schema like this:
root
|-- javaInteger: integer (nullable = true)
|-- scalaInt: integer (nullable = false)
|-- scalaOptionalInt: integer (nullable = true)
In the latest version of Spark, the thing that does the conversion for you is an Encoder. You can see a ton of the conversions in ExpressionEncoderSuite
Upvotes: 2
Reputation: 11
Optional
type denotes objects that can be undefined (None
). So it is mostly applicable to data.
There is no position at which it could be meaningfully used in your StructField
example:
Schema must be defined so
Option[StructField]
and doesn't provide any information about the type, not is semantically truthful and anything around
Option[DataType]
or
Option[IntegerType]
i.e
StructField("fieldn", Some(IntegerType): )
would make even less sense - either creating object with unclear semantics (former) or impossible API.
Fundamentally StructType
represent obligatory metadata. It cannot be missing by design, and because of that Option
doesn't have any place there.
Upvotes: 1