Anand MP
Anand MP

Reputation: 121

SQL BETWEEN AND operation on date is not working

This is the SQL query that I'm trying to execute:

select *,count(dummy) over(partition by dummy) as total_count 
from aaca711a5e78441cdbf062f1d630ee261 
WHERE (max_timestamp BETWEEN '2017-01-01' AND '2018-01-01') 
ORDER BY max_timestamp DESC

As far as I know in a BETWEEN AND operation, both values are inclusive. Here, this query is unable to fetch records corresponding to 2018-01-01.

I changed the query to this:

select *,count(dummy) over(partition by dummy) as total_count 
from aaca711a5e78441cdbf062f1d630ee261 
WHERE (max_timestamp >= '2017-01-01' AND max_timestamp <= '2018-01-01') 
ORDER BY max_timestamp DESC

Still, it's not working. Then I tried this:

select *,count(dummy) over(partition by dummy) as total_count 
from aaca711a5e78441cdbf062f1d630ee261 
WHERE (max_timestamp >= '2017-01-01' AND max_timestamp <= '2018-01-02') 
ORDER BY max_timestamp DESC

It's able to fetch records related to 2018-01-01.

What could be the reason for this? and how can I fix this? Thanks in advance.

Upvotes: 1

Views: 169

Answers (2)

Anand MP
Anand MP

Reputation: 121

This is a known issue with Spark.

Please refer to this link for more info: https://issues.apache.org/jira/browse/SPARK-10837

I've fixed this issue by using the date_add function provided by spark. so the last date was changed to date_add(endDate, 1) so that we'll get all the values including those corresponding to the last date.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271091

This is your query:

select *, count(dummy) over (partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
where max_timestamp BETWEEN '2017-01-01' AND '2018-01-01'
order by max_timestamp DESC;

Simply don't use between with date times. Use explicit logic:

select *, count(dummy) over (partition by dummy) as total_count
from aaca711a5e78441cdbf062f1d630ee261
where max_timestamp >= '2017-01-01' and
      max_timestamp < '2018-01-02'  --> notice this is one day later
order by max_timestamp DESC;

The problem is that you have a time component on the date.

Aaron Bertrand explains this very well in his blog What do BETWEEN and the devil have in common? (I am amused by the title, given that BETWEEN definitely does exist, but there is more controversy about the existence of the devil.)

Upvotes: 2

Related Questions