Spark Scala Int vs Integer for Option vs StructType

Question

Why is that for a case class I can do

fieldn: Option[Int]

or

fieldn: Option[Integer]

but for StructType I must use?

StructField("fieldn", IntegerType, true),

Kit Menke · Accepted Answer

I understand why it seems inconsistent - the reason is convenience. It is more convenient to give Spark a case class because they are really easy to work with in Scala.

Behind the scenes, Spark is taking the case class you give it and using it to determine the schema for your DataFrame. This means that all Java/Scala types will be converted to Spark SQL's types behind the scenes. For example, for the following case class:

case class TestIntConversion(javaInteger: java.lang.Integer, scalaInt: scala.Int, scalaOptionalInt: Option[scala.Int])

You get a schema like this:

root
 |-- javaInteger: integer (nullable = true)
 |-- scalaInt: integer (nullable = false)
 |-- scalaOptionalInt: integer (nullable = true)

In the latest version of Spark, the thing that does the conversion for you is an Encoder. You can see a ton of the conversions in ExpressionEncoderSuite

Spark Scala Int vs Integer for Option vs StructType

Answers (2)

Related Questions