Luis A.G.
Luis A.G.

Reputation: 1097

Alternative to deprecated java.sql.Date for Spark DataFrame

I'm trying to create a Spark DataFrame from scratch for testing purposes.

The point is that I want a date column and for that I'm using java.sql.Date like this:

val testSchema = List(
  StructField("test_string", StringType, true),
  StructField("test_date", DateType, true)
)

val testData = Seq(
    Row("hello", new java.sql.Date(2019 - 1900, 9, 29)),
    Row("world", new java.sql.Date(2019 - 1900, 7, 30))
)

val testDF = spark.createDataFrame(
  spark.sparkContext.parallelize(testData),
  StructType(testSchema)
)

This does the job but java.sql.Date is deprecated, so I tried with java.time.LocalDate and with java.util.Date (instantiating it directly and getting it from a java.util.GregorianCalendar) getting both time the same result:

Caused by: java.lang.RuntimeException: java.util.Date is not a valid external type for schema of date

and

Caused by: java.lang.RuntimeException: java.time.LocalDate is not a valid external type for schema of date

So what is the correct replace for java.sql.Date that matches the DateType schema?

Upvotes: 2

Views: 752

Answers (1)

Aleh Pranovich
Aleh Pranovich

Reputation: 361

With java.sql.Date.valueOf() it should work:

  import java.sql.Date

  val testSchema = List(
    StructField("test_string", StringType, true),
    StructField("test_date", DateType, true)
  )

  val testData = Seq(
    Row("hello", Date.valueOf("2019-10-29")),
    Row("world", Date.valueOf("2019-08-30"))
  )

  val testDF = spark.createDataFrame(
    spark.sparkContext.parallelize(testData),
    StructType(testSchema)
  )

  testDF.show()
  testDF.printSchema()


+-----------+----------+
|test_string| test_date|
+-----------+----------+
|      hello|2019-10-29|
|      world|2019-08-30|
+-----------+----------+

root
 |-- test_string: string (nullable = true)
 |-- test_date: date (nullable = true)

Upvotes: 2

Related Questions