Reputation: 17724
From spark 1.5.x on some convenience methods were added to deal with time.
df.select(datediff($"A", $"B")).show
But this returns the difference in days. Is there any option to convert it to a different interval e.g. years?
Manually I used
df.select(year($"A")- year($"B")).show
which is good enough. But it would be better if the date difference was calculated more accurately e.g. including months / days like displayed http://howtodoinjava.com/for-fun-only/java-code-to-calculate-age-from-date-of-birth/
Is a UDF the go-to solution for current spark 2.0.2? or Am I missing an option to convert the difference as days to e.g. years?
Upvotes: 1
Views: 3289
Reputation: 330413
Timestamp
type can be converted to its the Unix timestamp in seconds. If this resolution is good enough for you the rest is just a bunch of arithmetic operations. You can compute difference in seconds as:
import org.apache.spark.sql.Column
def secondsBetween(col1: Column, col2: Column) =
col2.cast("timestamp").cast("bigint") - col1.cast("timestamp").cast("bigint")
and re-sample according to your needs, for example:
def minutesBetween(col1: Column, col2: Column) =
(secondsBetween(col1, col2) / 60).cast("bigint")
Obviously this doesn't handle all the subtleties of date and time processing like daylight saving time or leap second so if these are important in your domain then UDF with proper time processing library is recommended.
Upvotes: 5