Reputation: 291
I have a dataframe (df1
) with 2 StringType
fields.
Field1 (StringType) Value-X
Field2 (StringType) value-20180101
All I am trying to do is create another dataframe (df2
) from df1
with 2 fields-
Field1 (StringType) Value-X
Field2 (Date Type) Value-2018-01-01
I am using the below code-
df2=df1.select(
col("field1").alias("f1"),
unix_timestamp(col("field2"),"yyyyMMdd").alias("f2")
)
df2.show
df2.printSchema
For this field 2, I tried multiple things - unix_timestamp
, from_unixtimestamp
, to_date
, cast(“date”)
but nothing worked
I need the following schema as output:
df2.printSchema
|-- f1: string (nullable = false)
|-- f2: date (nullable = false)
I'm using Spark 2.1
Upvotes: 1
Views: 10249
Reputation: 22449
to_date
seems to work fine for what you need:
import org.apache.spark.sql.functions._
val df1 = Seq( ("X", "20180101"), ("Y", "20180406") ).toDF("c1", "c2")
val df2 = df1.withColumn("c2", to_date($"c2", "yyyyMMdd"))
df2.show
// +---+----------+
// | c1| c2|
// +---+----------+
// | X|2018-01-01|
// | Y|2018-04-06|
// +---+----------+
df2.printSchema
// root
// |-- c1: string (nullable = true)
// |-- c2: date (nullable = true)
[UPDATE]
For Spark 2.1 or prior, to_date
doesn't take format string as a parameter, hence explicit string formatting to the standard yyyy-MM-dd
format using, say, regexp_replace
is needed:
val df2 = df1.withColumn(
"c2", to_date(regexp_replace($"c2", "(\\d{4})(\\d{2})(\\d{2})", "$1-$2-$3"))
)
Upvotes: 4