Spark 1.4.1: Issue when reading MySQL BigInt columns

Question

When reading MySQL columns that are of type BigInt (e.g. BIGINT(21) UNSIGNED below), Spark cannot cast java.math.BigDecimal to a String in the following snippet:

val driver = "com.mysql.jdbc.Driver"
val server = ...
val infoSchema = "INFORMATION_SCHEMA"
val port = 3306
val user = ...
val pw = ...
val dbUrl = s"jdbc:mysql://$server:$port/$infoSchema"

val dbProperties = new java.util.Properties()
dbProperties.setProperty("driver", driver)
dbProperties.setProperty("user", user)
dbProperties.setProperty("password", pw)

val schema = ...
val table = ...

val cols = sqlContext.read.jdbc(dbUrl, "COLUMNS", dbProperties)
  .filter(col("TABLE_SCHEMA") === schema && col("TABLE_NAME") === table)
  .map(_.getValuesMap[String](Seq("ORDINAL_POSITION", "COLUMN_NAME")))
  .collect()
  .toList

cols.map(e => e("COLUMN_NAME"))
cols.map(e => e("ORDINAL_POSITION")) // java.math.BigDecimal cannot be cast to java.lang.String

However, when I do the following, there is no issue:

val num = new java.math.BigDecimal(1)
num.toString

Is this a bug or am I missing something?

zero323 · Accepted Answer

Row.getValuesMap[T] is not used for type casting. Instead it explicitly states that values are of type T (internally it is just a get followed asInstanceOf[T]) and BigDecimal is clearly not a String.

You could:

Add implicit conversion.
Use _.getValuesMap[Any].

Use SQL cast before mapping.

withColumn("ORDINAL_POSITION", $"ORDINAL_POSITION".cast(StringType))

but to be honest honest all these options are rather ugly it makes more sense to extract values directly:

sqlContext.read.jdbc(...).filter(...)
  .select("ORDINAL_POSITION", "COLUMN_NAME")
  .rdd
  .map { case Row(i: java.math.BigDecimal, c: String) => (i, c) }

Spark 1.4.1: Issue when reading MySQL BigInt columns

Answers (1)

Related Questions