marie20
marie20

Reputation: 783

Comparing Dates by Day and Month in Spark SQL

How to compare 2 dates by Month and Day only in Spark SQL query ? My table has 2 columns, date1 and date2. I need to compare them to check if the dd & MM parts of date1 (combined) is greater than or equal to the same of date2 and set a flag accordingly.

For e.g :

date1      |    date2   | flag
2017-05-05 | 2016-10-15 |  N
2019-06-22 | 2020-02-06 |  Y
2020-10-09 | 2020-10-09 |  E

I tried something like this to extract the desired parts (as in other RDBMs) but could not get it to work. I expected the function to return MM and dd as int which I could then compare, but this returns back another date

select 
date1,
date2,
case when (date_trunc('MM', date1) > date_trunc('MM', date2) ) AND (date_trunc('dd', date1) > date_trunc('dd', date2) )
     then 'Y'
     when (date_trunc('MM', date1) = date_trunc('MM', date2) ) AND (date_trunc('dd', date1) = date_trunc('dd', date2) )
     then 'E'
     else 'N'
end as flag

Using query in Spark SQL 2.4

Thanks

Upvotes: 0

Views: 2157

Answers (1)

sathya
sathya

Reputation: 2072

the below code might help to solve your problem,

import pyspark.sql.functions as F

input_row=[["2017-05-05","2016-10-15"],["2019-06-22","2020-02-06"],["2020-10-09","2020-10-09"]]

df = spark.createDataFrame(input_row,["date1","date2"])
df.show()

'''
+----------+----------+
|     date1|     date2|
+----------+----------+
|2017-05-05|2016-10-15|
|2019-06-22|2020-02-06|
|2020-10-09|2020-10-09|
+----------+----------+
'''

Approach 1: using spark sql and month and dayofmonth built in methods


df.createOrReplaceTempView("test_table")

spark.sql("select *,case when ((month(CAST(`date1` AS DATE)) > month(CAST(`date2` AS DATE))) and (dayofmonth(CAST(`date1` AS DATE))> dayofmonth(CAST (`date2` AS DATE)))) then 'Y' when ((month(CAST(`date1` AS DATE)) < month(CAST(`date2` AS DATE))) and (dayofmonth(CAST(`date1` AS DATE))< dayofmonth(CAST (`date2` AS DATE)))) then 'N'  else 'E' end as FLAG from test_table").show()

'''
+----------+----------+----+
|     date1|     date2|FLAG|
+----------+----------+----+
|2017-05-05|2016-10-15|   N|
|2019-06-22|2020-02-06|   Y|
|2020-10-09|2020-10-09|   E|
+----------+----------+----+
'''

Approach 2. using spark sql datediff function


df.createOrReplaceTempView("test_table")

spark.sql("select *, case when datediff(date1, date2)>0 then 'Y' when datediff( date1, date2 )<0 then 'N' else 'E' end as FLAG from test_table").show()

'''
+----------+----------+----+
|     date1|     date2|FLAG|
+----------+----------+----+
|2017-05-05|2016-10-15|   Y|
|2019-06-22|2020-02-06|   N|
|2020-10-09|2020-10-09|   E|
+----------+----------+----+
'''

3. Alternate approach, using DF expressions instead of Spark SQL,

df.select(df["*"], F.expr("CASE WHEN datediff( date1, date2 )>0 THEN 'Y' " + "WHEN datediff( date1, date2 )<0 then 'N' " + "else 'E' end").alias("FLAG")).show()

'''
+----------+----------+----+
|     date1|     date2|FLAG|
+----------+----------+----+
|2017-05-05|2016-10-15|   Y|
|2019-06-22|2020-02-06|   N|
|2020-10-09|2020-10-09|   E|
+----------+----------+----+
'''

Upvotes: 1

Related Questions