Reputation: 281
query:
with q1 as (
select 5 as dayid,1 as isok -- change 5 to 3 and query returns correct answer
union all
select 4 as dayid,0 as isok
)
,q2 as (
select 4 as startday,4 as endday
union all
select 8 as startday,8 as endday
)
, r1 as (
select 1 as id,* from q1 j
where (j.isok=0)
union all
select 2 as id,* from q1 j
where (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
)
, r2 as (
select 3 as id,* from q1 j
where (j.isok=0)
or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
)
select 2 as level,* from r2
union all
select 1 as level,* from r1
;
returns:
LEVEL ID DAYID ISOK
- - - -
2 3 5 1
2 3 4 0
1 1 4 0
1 2 4 0
4 row(s)
It should return:
level | id | dayid | isok
-------+----+-------+------
2 | 3 | 4 | 0
1 | 1 | 4 | 0
1 | 2 | 4 | 0
(3 rows)
like that same query returns in postgres + sqlserver.
If you change select 5 to select 3 in q1, you get right result.
The problem is in query where you use "or" + "exists" + "<= + <=".
Upvotes: 3
Views: 175
Reputation: 175556
It seems that the clue is query rewrite performed behind the scene. Using Simeon's example:
with q1(dayid, isok) as (
SELECT * FROM VALUES
(5,1),
(4,0)
),q2(startday, endday) as (
SELECT * FROM VALUES
(4,4),
(8,8)
)
select 3 as id,
j.*,
j.isok=0 as clause_1
from q1 as j
where clause_1
OR (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;
Snowflake's output:
Running only exists condition(SEMI JOIN
):
with q1(dayid, isok) as (
SELECT * FROM VALUES
(5,1),
(4,0)
),q2(startday, endday) as (
SELECT * FROM VALUES
(4,4),
(8,8)
)
select 3 as id,
j.*,
j.isok=0 as clause_1
from q1 as j
where (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;
With OR applied:
with q1(dayid, isok) as (
SELECT * FROM VALUES
(5,1),
(4,0)
),q2(startday, endday) as (
SELECT * FROM VALUES
(4,4),
(8,8)
)
select 3 as id,
j.*,
j.isok=0 as clause_1
from q1 as j
where (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
or clause_1
;
Here the query was executed as LEFT JOIN
but this time on aggregated values from q2: MIN(col1), MAX(col2).
It casues that row 3 5 1 FALSE
is added to the output.
Upvotes: 1
Reputation: 25893
so given you are unhappy with the level 2 results, we can trim the query down to just that half:
with q1(dayid, isok) as (
SELECT * FROM VALUES
(5,1),
(4,0)
),q2(startday, endday) as (
SELECT * FROM VALUES
(4,4),
(8,8)
), r2 as (
select 3 as id,
j.*
from q1 as j
where (j.isok=0)
or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
)
select 2 as level,
j.*
from r2 as j
and sure enough the still problem happens.
LEVEL | ID | DAYID | ISOK |
---|---|---|---|
2 | 3 | 5 | 1 |
2 | 3 | 4 | 0 |
which is the same as:
with q1(dayid, isok) as (
SELECT * FROM VALUES
(5,1),
(4,0)
),q2(startday, endday) as (
SELECT * FROM VALUES
(4,4),
(8,8)
)
select 3 as id,
j.*
from q1 as j
where (j.isok=0)
or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;
if we pull clause 1 j.isok=0
into the select as
select 3 as id,
j.*,
j.isok=0 as clause_1
from q1 as j
where (j.isok=0)
or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;
We get false for dayid 5 as expected so it's the exists providing that half.
if we comment out the OR
select 3 as id,
j.*,
j.isok=0 as clause_1
from q1 as j
where --(j.isok=0)
--or
(exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
we get just the dayid 4 row:
So it seems the OR is has a bug??
if we change it to a LEFT JOIN:
select 3 as id,
j.*,
j.isok=0 as clause_1,
k.*,
k.startday<=j.dayid and j.dayid<=k.endday as clause_2
from q1 as j
LEFT JOIN q2 k
ON k.startday<=j.dayid and j.dayid<=k.endday
where clause_1 OR clause_2
;
we only get the dayid 4
if we push the exists into the SELECT
select 3 as id,
j.*,
j.isok=0 as clause_1,
k.*,
k.startday<=j.dayid and j.dayid<=k.endday as clause_2,
(exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday)) AS clause_2b
from q1 as j
LEFT JOIN q2 k
ON k.startday<=j.dayid and j.dayid<=k.endday
where clause_1 OR
(exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
we get the false we are expecting in the OR
ID | DAYID | ISOK | CLAUSE_1 | STARTDAY | ENDDAY | CLAUSE_2 | CLAUSE_2B |
---|---|---|---|---|---|---|---|
3 | 4 | 0 | TRUE | 4 | 4 | TRUE | TRUE |
3 | 5 | 1 | FALSE | FALSE |
If we use CLAUSE_2B it works as expected:
select 3 as id,
j.*,
j.isok=0 as clause_1,
(exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday)) AS clause_2b
from q1 as j
where clause_1 OR clause_2b
ID | DAYID | ISOK | CLAUSE_1 | CLAUSE_2B |
---|---|---|---|---|
3 | 4 | 0 | TRUE | TRUE |
So YES there is a bug in the handling of OR and EXISTS clauses in the where.. But there is a work-around to ether use a LEFT JOIN if the data is distinct, OR put the WHERE clause into the SELECT and then refer to that.. but still totally a bug, open a support case
with the cut down problem:
with q1(dayid, isok) as (
SELECT * FROM VALUES
(5,1),
(4,0)
),q2(startday, endday) as (
SELECT * FROM VALUES
(4,4),
(8,8)
)
select 3 as id,
j.*,
j.isok=0 as clause_1
from q1 as j
where clause_1 OR (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;
Upvotes: 0