Reputation: 3043
I'm trying to use : org.apache.spark.sql.functions's date_format to format a given string Date but it doesn't work :
here is the code that I execute :
val data = Seq("20190413","20190310","20190923").toDF
data.show
+--------+
| value|
+--------+
|20190413|
|20190310|
|20190923|
+--------+
val data2=data.withColumn("date",date_format(col("value"),"yyyyMMdd"))
data2.show
+--------+----+
| value|date|
+--------+----+
|20190413|null|
|20190310|null|
|20190923|null|
+--------+----+
from the doc it is said that :
date_format(dateExpr: Column, format: String): Column
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.See java.text.SimpleDateFormat for valid date and time format patterns.
What's wrong with my code?
Upvotes: 2
Views: 4442
Reputation: 9427
If you're using strings, they should be ISO-formatted dates. The format you specify is applied to output.
scala> val data=Seq("2019-04-13","2019-03-10","2019-09-23").toDF
scala> val data2=data3.
| withColumn("date",date_format(col("value"),"yyyyMMdd")).
| withColumn("date-US",date_format(col("value"),"MM/dd/yyyy"))
scala> data2.show()
+----------+--------+----------+
| value| date| date-US|
+----------+--------+----------+
|2019-04-13|20190413|04/13/2019|
|2019-03-10|20190310|03/10/2019|
|2019-09-23|20190923|09/23/2019|
+----------+--------+----------+
EDIT
If your string representation of dates is in some other format, you will probably need to convert them to proper date type, and then format as you like. For example:
scala> val data2=data.withColumn("date-US",date_format(to_date(col("value"),"yyyyMMdd"),"MM/dd/yyyy"))
scala> data2.show()
+--------+----------+
| value| date-US|
+--------+----------+
|20190413|04/13/2019|
|20190310|03/10/2019|
|20190923|09/23/2019|
+--------+----------+
Format used in to_date()
function here is the input format.
Upvotes: 1
Reputation: 1892
In Spark you can use unix_timestamp and from_unixtime
scala> data.withColumn("tt",from_unixtime(unix_timestamp(col("value"),"yyyyMMdd"),"yyyy/MM/dd")).show
+--------+----------+
| value| tt|
+--------+----------+
|20190413|2019/04/13|
|20190310|2019/03/10|
|20190923|2019/09/23|
+--------+----------+
this is a proper way to achieving the desired output.I
Upvotes: 0