Reputation: 6465
I have a dataframe which has two column dates in unixtime and I want to find the week difference between these two columns. There is a weekOfYear
UDF in SparkSQL but that is only useful when both dates fall in the same year. How can I find the week difference then?
p.s. I'm using Scala Spark.
Upvotes: 0
Views: 1174
Reputation: 88
As you have UNIXTIME
date format we can do this expression.
((date1-date2)/(60*60*24*7)).toInt
Edit: Updating this answer with example
spark.udf.register("weekdiff", (from: Long, to: Long) => ((from - to) / (604800)).toInt)
// 60*60*24*7 => 604800
df.withColumn("weekdiff", weekdiff(df("date1_col_name"), df("date2_col_name")))
Upvotes: 1
Reputation: 3544
You can take the approach of creating a custom UDF for this:
scala> val df=sc.parallelize(Seq((1480401142453L,1480399932853L))).toDF("date1","date2")
df: org.apache.spark.sql.DataFrame = [date1: bigint, date2: bigint]
scala> df.show
+-------------+-------------+
| date1| date2|
+-------------+-------------+
|1480401142453|1480399932853|
+-------------+-------------+
scala> val udfDateDifference=udf((date1:Long,date2:Long)=>((date1-date2)/(60*60*24*7)).toInt
|
| )
udfDateDifference: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,IntegerType,Some(List(LongType, LongType)))
scala> val resultDF=df.withColumn("dateDiffernece",udfDateDifference(df("date1"),df("date2")))
resultDF: org.apache.spark.sql.DataFrame = [date1: bigint, date2: bigint ... 1 more field]
scala> resultDF.show
+-------------+-------------+--------------+
| date1| date2|dateDiffernece|
+-------------+-------------+--------------+
|1480401142453|1480399932853| 2|
+-------------+-------------+--------------+
And hence you can get the difference !
Upvotes: 1