Reputation: 29159
The following code read data from a database table and return DataSet[Cols]
.
case class Cols (F1: String, F2: BigDecimal, F3: Int, F4: Date, ...)
def readTable() : DataSet[Cols] = {
import sqlContext.sparkSession.implicits._
sqlContext.read.format("jdbc").options(Map(
"driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url" -> jdbcSqlConn,
"dbtable" -> s"..."
)).load()
.select("F1", "F2", "F3", "F4")
.as[Cols]
}
The values may be nulls. Later it raised runtime exception when using these fields.
val r = readTable.filter(x => (if (x.F3 > ...
What's the Scala idiomatic way to handle nulls in the DataSet?
I got the error when running the code.
java.lang.NullPointerException at scala.math.BigDecimal.$minus(BigDecimal.scala:563) at MappingPoint$$anonfun$compare$1.apply(Mapping.scala:51)
Upvotes: 0
Views: 519
Reputation: 1617
Options are the idiomatic way
case class Cols (F1: Option[String], F2: Option[BigDecimal], F3: Option[Int], F4: Option[Date], ...)
There is a performance hit as discussed in the databricks style guide
Upvotes: 4
Reputation: 1317
Option(null)
will return None
.
Thus, for instance:
val r = readTable.filter(x => (if (Option(x.F3).getOrElse(0) >
Upvotes: 2