Reputation: 1097
I'm trying to create a Spark DataFrame from scratch for testing purposes.
The point is that I want a date column and for that I'm using java.sql.Date
like this:
val testSchema = List(
StructField("test_string", StringType, true),
StructField("test_date", DateType, true)
)
val testData = Seq(
Row("hello", new java.sql.Date(2019 - 1900, 9, 29)),
Row("world", new java.sql.Date(2019 - 1900, 7, 30))
)
val testDF = spark.createDataFrame(
spark.sparkContext.parallelize(testData),
StructType(testSchema)
)
This does the job but java.sql.Date
is deprecated, so I tried with java.time.LocalDate
and with java.util.Date
(instantiating it directly and getting it from a java.util.GregorianCalendar
) getting both time the same result:
Caused by: java.lang.RuntimeException: java.util.Date is not a valid external type for schema of date
and
Caused by: java.lang.RuntimeException: java.time.LocalDate is not a valid external type for schema of date
So what is the correct replace for java.sql.Date
that matches the DateType
schema?
Upvotes: 2
Views: 752
Reputation: 361
With java.sql.Date.valueOf()
it should work:
import java.sql.Date
val testSchema = List(
StructField("test_string", StringType, true),
StructField("test_date", DateType, true)
)
val testData = Seq(
Row("hello", Date.valueOf("2019-10-29")),
Row("world", Date.valueOf("2019-08-30"))
)
val testDF = spark.createDataFrame(
spark.sparkContext.parallelize(testData),
StructType(testSchema)
)
testDF.show()
testDF.printSchema()
+-----------+----------+
|test_string| test_date|
+-----------+----------+
| hello|2019-10-29|
| world|2019-08-30|
+-----------+----------+
root
|-- test_string: string (nullable = true)
|-- test_date: date (nullable = true)
Upvotes: 2