user5068547
user5068547

Reputation: 206

How to calculate difference between date column and current date?

I am trying to calculate the Date Diff between a column field and current date of the system.

Here is my sample code where I have hard coded the my column field with 20170126.

val currentDate = java.time.LocalDate.now
var datediff = spark.sqlContext.sql("""Select  datediff(to_date('$currentDate'),to_date(DATE_FORMAT(CAST(unix_timestamp( cast('20170126' as String), 'yyyyMMdd') AS TIMESTAMP), 'yyyy-MM-dd'))) AS  GAP
        """)
datediff.show()

Output is like:
+----+
| GAP|
+----+
|null|
+----+ 

I need to calculate actual Gap Between the two dates but getting NULL.

Upvotes: 1

Views: 5425

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74669

You have not defined the type and format of "column field" so I assume it's a string in the (not-very-pleasant) format YYYYMMdd.

val records = Seq((0, "20170126")).toDF("id", "date")

scala> records.show
+---+--------+
| id|    date|
+---+--------+
|  0|20170126|
+---+--------+

scala> records
  .withColumn("year", substring($"date", 0, 4))
  .withColumn("month", substring($"date", 5, 2))
  .withColumn("day", substring($"date", 7, 2))
  .withColumn("d", concat_ws("-", $"year", $"month", $"day"))
  .select($"id", $"d" cast "date")
  .withColumn("datediff", datediff(current_date(), $"d"))
  .show
+---+----------+--------+
| id|         d|datediff|
+---+----------+--------+
|  0|2017-01-26|      83|
+---+----------+--------+

PROTIP: Read up on functions object.

Caveats

cast

Please note that I could not convince Spark SQL to cast the column "date" to DateType given the rules in DateTimeUtils.stringToDate:

  • yyyy,
  • yyyy-[m]m
  • yyyy-[m]m-[d]d
  • yyyy-[m]m-[d]d
  • yyyy-[m]m-[d]d *
  • yyyy-[m]m-[d]dT*

date_format

I could not convince date_format to work either so I parsed "date" column myself using substring and concat_ws functions.

Upvotes: 3

Related Questions