aName
aName

Reputation: 3043

spark function : date_format doesn't work

I'm trying to use : org.apache.spark.sql.functions's date_format to format a given string Date but it doesn't work :
here is the code that I execute :

val data = Seq("20190413","20190310","20190923").toDF
data.show 
    +--------+
    |   value|
    +--------+
    |20190413|
    |20190310|
    |20190923|
    +--------+
val data2=data.withColumn("date",date_format(col("value"),"yyyyMMdd"))
data2.show
    +--------+----+
    |   value|date|
    +--------+----+
    |20190413|null|
    |20190310|null|
    |20190923|null|
    +--------+----+

from the doc it is said that :

date_format(dateExpr: Column, format: String): Column
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

See java.text.SimpleDateFormat for valid date and time format patterns.

What's wrong with my code?

Upvotes: 2

Views: 4442

Answers (2)

mazaneicha
mazaneicha

Reputation: 9427

If you're using strings, they should be ISO-formatted dates. The format you specify is applied to output.

scala> val data=Seq("2019-04-13","2019-03-10","2019-09-23").toDF
scala> val data2=data3.
     |           withColumn("date",date_format(col("value"),"yyyyMMdd")).
     |           withColumn("date-US",date_format(col("value"),"MM/dd/yyyy"))
scala> data2.show()
+----------+--------+----------+
|     value|    date|   date-US|
+----------+--------+----------+
|2019-04-13|20190413|04/13/2019|
|2019-03-10|20190310|03/10/2019|
|2019-09-23|20190923|09/23/2019|
+----------+--------+----------+

EDIT

If your string representation of dates is in some other format, you will probably need to convert them to proper date type, and then format as you like. For example:

scala> val data2=data.withColumn("date-US",date_format(to_date(col("value"),"yyyyMMdd"),"MM/dd/yyyy"))
scala> data2.show()
+--------+----------+
|   value|   date-US|
+--------+----------+
|20190413|04/13/2019|
|20190310|03/10/2019|
|20190923|09/23/2019|
+--------+----------+

Format used in to_date() function here is the input format.

Upvotes: 1

Mahesh Gupta
Mahesh Gupta

Reputation: 1892

In Spark you can use unix_timestamp and from_unixtime

scala> data.withColumn("tt",from_unixtime(unix_timestamp(col("value"),"yyyyMMdd"),"yyyy/MM/dd")).show

+--------+----------+
|   value|        tt|
+--------+----------+
|20190413|2019/04/13|
|20190310|2019/03/10|
|20190923|2019/09/23|
+--------+----------+

this is a proper way to achieving the desired output.I

Upvotes: 0

Related Questions