Reputation: 65
I have a string in the format of
"5/02/2016" // d/ww/yyyy
that I want to convert to a format of
yyyy-mm-dd
I have tried the following
val df = Seq((1L, "5/02/2016"), (2L, "aaa")).toDF("id", "date")
val ts = unix_timestamp($"date", "d/ww/yyyy").cast("timestamp")
df.withColumn("ts", ts).show(2, false)
I get
//output
+---+---------+-----------+
|id |date |ts |
+---+---------+-----------+
|1 |5/02/2016|2016-01-05 |
|2 |aaa |null |
+---+---------+-----------+
When I want
//expected
+---+---------+-----------+
|id |date |ts |
+---+---------+-----------+
|1 |5/02/2016|2016-01-19 |
|2 |aaa |null |
+---+---------+-----------+
Upvotes: 1
Views: 1583
Reputation: 22439
As pointed out by @puhlen, day of week should be u
, not d
(see SimpleDateFormat)
val df = Seq((1L, "5/02/2016"), (2L, "aaa")).toDF("id", "date")
val ts = unix_timestamp($"date", "u/ww/yyyy").cast("timestamp")
df.withColumn("ts", ts).show(2, false)
+---+---------+---------------------+
|id |date |ts |
+---+---------+---------------------+
|1 |5/02/2016|2016-01-08 00:00:00.0|
|2 |aaa |null |
+---+---------+---------------------+
Please also note that you shouldn't expect 5/02/2016
will be arithmetically the same as day 2 x 7 + 5
of 2016
. You should actually check the 2016 calendar, and the Friday in its 2nd week is in fact January 8.
Upvotes: 1
Reputation:
Converting dates is a tricky business. In this case, leap years prevent us from mapping day of year directly to month of year and day of month.
In Scala, we may use the java.util.GregorianCalendar:
def weekToDate(weekStr: String) = {
val (day, week, year) = {
val arr = weekStr.split("/").map(_.toInt)
(arr(0), arr(1), arr(2))
}
val cal = new java.util.GregorianCalendar()
cal.set(java.util.Calendar.YEAR, year)
cal.set(java.util.Calendar.DAY_OF_YEAR, 7 * week + day)
new java.text.SimpleDateFormat("yyyy-MM-dd").format(cal.getTime)
}
weekToDate("5/02/2016") // res0: String = 2016-01-19
// Leap year example
weekToDate("4/08/2016") // res1: String = 2016-02-29
weekToDate("4/08/2017") // res2: String = 2017-03-01
Putting it all together:
import spark.implicits._
import org.apache.spark.sql.functions.udf
def weekToDate(weekStr: String) = {
val (day, week, year) = {
val arr = weekStr.split("/").map(_.toInt)
(arr(0), arr(1), arr(2))
}
val cal = new java.util.GregorianCalendar()
cal.set(java.util.Calendar.YEAR, year)
cal.set(java.util.Calendar.DAY_OF_YEAR, 7 * week + day)
new java.text.SimpleDateFormat("yyyy-MM-dd").format(cal.getTime)
}
val df = Seq((1L, "5/02/2016"), (2L, "4/8/2016")).toDF("id", "date").select("date")
val wfn: String => String = weekToDate(_)
val tsUDF=udf(wfn)
df.withColumn("ts", tsUDF('date)).show(2, false)
+---------+----------+
|date |ts |
+---------+----------+
|5/02/2016|2016-01-19|
|4/8/2016 |2016-02-29|
+---------+----------+
Upvotes: 2