Capa
Capa

Reputation: 65

Convert week to date spark

I have a string in the format of

"5/02/2016" // d/ww/yyyy

that I want to convert to a format of

yyyy-mm-dd

I have tried the following

val df = Seq((1L, "5/02/2016"), (2L, "aaa")).toDF("id", "date")
val ts = unix_timestamp($"date", "d/ww/yyyy").cast("timestamp")
df.withColumn("ts", ts).show(2, false)

I get

//output
+---+---------+-----------+
|id |date     |ts         |
+---+---------+-----------+
|1  |5/02/2016|2016-01-05 |
|2  |aaa      |null       |
+---+---------+-----------+

When I want

//expected
+---+---------+-----------+
|id |date     |ts         |
+---+---------+-----------+
|1  |5/02/2016|2016-01-19 |
|2  |aaa      |null       |
+---+---------+-----------+

Upvotes: 1

Views: 1583

Answers (2)

Leo C
Leo C

Reputation: 22439

As pointed out by @puhlen, day of week should be u, not d (see SimpleDateFormat)

val df = Seq((1L, "5/02/2016"), (2L, "aaa")).toDF("id", "date")
val ts = unix_timestamp($"date", "u/ww/yyyy").cast("timestamp")
df.withColumn("ts", ts).show(2, false)

+---+---------+---------------------+
|id |date     |ts                   |
+---+---------+---------------------+
|1  |5/02/2016|2016-01-08 00:00:00.0|
|2  |aaa      |null                 |
+---+---------+---------------------+

Please also note that you shouldn't expect 5/02/2016 will be arithmetically the same as day 2 x 7 + 5 of 2016. You should actually check the 2016 calendar, and the Friday in its 2nd week is in fact January 8.

Upvotes: 1

user267817
user267817

Reputation:

Converting dates is a tricky business. In this case, leap years prevent us from mapping day of year directly to month of year and day of month.

In Scala, we may use the java.util.GregorianCalendar:

def weekToDate(weekStr: String) = {
  val (day, week, year) = {
    val arr = weekStr.split("/").map(_.toInt)
    (arr(0), arr(1), arr(2))
  }
  val cal = new java.util.GregorianCalendar()
  cal.set(java.util.Calendar.YEAR, year)
  cal.set(java.util.Calendar.DAY_OF_YEAR, 7 * week + day)
  new java.text.SimpleDateFormat("yyyy-MM-dd").format(cal.getTime)
}
weekToDate("5/02/2016") // res0: String = 2016-01-19

// Leap year example
weekToDate("4/08/2016") // res1: String = 2016-02-29
weekToDate("4/08/2017") // res2: String = 2017-03-01

Putting it all together:

import spark.implicits._
import org.apache.spark.sql.functions.udf

def weekToDate(weekStr: String) = {
  val (day, week, year) = {
    val arr = weekStr.split("/").map(_.toInt)
    (arr(0), arr(1), arr(2))
  }
  val cal = new java.util.GregorianCalendar()
  cal.set(java.util.Calendar.YEAR, year)
  cal.set(java.util.Calendar.DAY_OF_YEAR, 7 * week + day)
  new java.text.SimpleDateFormat("yyyy-MM-dd").format(cal.getTime)
}

val df = Seq((1L, "5/02/2016"), (2L, "4/8/2016")).toDF("id", "date").select("date")

val wfn: String => String = weekToDate(_)
val tsUDF=udf(wfn)
df.withColumn("ts", tsUDF('date)).show(2, false)

+---------+----------+
|date     |ts        |
+---------+----------+
|5/02/2016|2016-01-19|
|4/8/2016 |2016-02-29|
+---------+----------+

Upvotes: 2

Related Questions