Reputation: 11885
Regarding the following code:
val types = df.schema.fields.map(_.dataType)
types(i) match {
case StringType => println("String")
case ArrayType => println("Array")
case _ => println("Miscellaneous")
}
ArrayType produces an error:
Pattern type is incompatible with expected type, found: ArrayType.type, required: DataType
On first glance it looks weird because both StringType (which doesn't produce an error) and ArrayType are located at the same location (org.apache.spark.sql.types).
But then after checking the hierarchy tree, I've figured out that the closest ancestor of both classes is AbsractDataType:
AbsractDataType -> DataType -> AtomicType -> StringType
vs
AbsractDataType -> ArrayType
types
is an Array of DataType.
So I could probably just cast types
to an Array of AbsractDataType, but I don't think it's the solution because if types
is an Array of DataType then it means that also cells that contain arrays should be included there.
Upvotes: 0
Views: 795
Reputation: 7316
First of all both ArrayType
and StringType
have basic class DataType
. The hierarchy in both cases has as next:
StringType > AtomicType > DataType > AbsractDataType
ArrayType > DataType > AbsractDataType
I believe that in your example you confused the companion object of ArrayType
which indeed inherits AbsractDataType
object directly.
That means we can use pattern matching to derived types of DataType
as you already did but the matching should occur based on the given constructor:
import org.apache.spark.sql.types._
val df = Seq(
("1", Seq("A", "D", "B")),
("2", Seq("A", "D", "B")),
("3", Seq("B", "C", "G")),
("5", Seq("B", "D", "L"))
).toDF("ID","ar")
df.schema.fields.map{
_.dataType match {
case StringType => println("String")
case ArrayType(_, _) => println("Array")
case _ => println("Miscellaneous")
}
}
// String
// Array
You can see that ArrayType expects two arguments since the definition is:
case class ArrayType(elementType: DataType, containsNull: Boolean) extends DataType
but StringType
it doesn't:
class StringType private() extends AtomicType
Finally, if you only intended to get the string representation of each type you could also use the typeName
property of dataType
:
df.schema.fields.map{_.dataType.typeName}
// res24: Array[String] = Array(string, array)
Understanding Pattern Matching with Sub-classes
https://users.scala-lang.org/t/correct-syntax-in-pattern-matching/2187/4
Upvotes: 1
Reputation: 7207
Such construction can work:
types(i) match {
case StringType => println("String")
case _:ArrayType => println("Array")
case _ => println("Miscellaneous")
}
Upvotes: 0