Ismo
Ismo

Reputation: 281

Snowflake bug in query(condition: "or" + "exists" + "<= + <=".)

query:

with q1 as (
  select 5 as dayid,1 as isok -- change 5 to 3 and query returns correct answer
  union all
  select 4 as dayid,0 as isok
)
,q2 as (
  select 4 as startday,4 as endday
  union all
  select 8 as startday,8 as endday
)
, r1 as (
  select 1 as id,* from q1 j
  where (j.isok=0)
  union all
  select 2 as id,* from q1 j
  where (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
)
, r2 as (
  select 3 as id,* from q1 j
  where (j.isok=0)
     or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
)
select 2 as level,* from r2
union all
select 1 as level,* from r1
;

returns:

LEVEL ID DAYID ISOK
- - - -
2 3 5 1
2 3 4 0
1 1 4 0
1 2 4 0

4 row(s)

It should return:

 level | id | dayid | isok 
-------+----+-------+------
     2 |  3 |     4 |    0
     1 |  1 |     4 |    0
     1 |  2 |     4 |    0

(3 rows)

like that same query returns in postgres + sqlserver.

If you change select 5 to select 3 in q1, you get right result.

The problem is in query where you use "or" + "exists" + "<= + <=".

Upvotes: 3

Views: 175

Answers (2)

Lukasz Szozda
Lukasz Szozda

Reputation: 175556

It seems that the clue is query rewrite performed behind the scene. Using Simeon's example:

with q1(dayid, isok) as (
    SELECT * FROM VALUES
    (5,1), 
    (4,0)
),q2(startday, endday) as (
    SELECT * FROM VALUES
    (4,4), 
    (8,8)
)
select 3 as id,
    j.*,
    j.isok=0 as clause_1
from q1 as j
where clause_1 
 OR (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;

db<>fiddle demo

Snowflake's output:

enter image description here


Running only exists condition(SEMI JOIN):

with q1(dayid, isok) as (
        SELECT * FROM VALUES
        (5,1), 
        (4,0)
),q2(startday, endday) as (
        SELECT * FROM VALUES
        (4,4), 
        (8,8)
)
select 3 as id,
        j.*,
        j.isok=0 as clause_1
from q1 as j
where (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;

enter image description here

With OR applied:

with q1(dayid, isok) as (
        SELECT * FROM VALUES
        (5,1), 
        (4,0)
),q2(startday, endday) as (
        SELECT * FROM VALUES
        (4,4), 
        (8,8)
)
select 3 as id,
        j.*,
        j.isok=0 as clause_1
from q1 as j
where (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday)) 
  or clause_1
    ;

Here the query was executed as LEFT JOIN but this time on aggregated values from q2: MIN(col1), MAX(col2).

enter image description here

It casues that row 3 5 1 FALSE is added to the output.

Upvotes: 1

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25893

so given you are unhappy with the level 2 results, we can trim the query down to just that half:

with q1(dayid, isok) as (
    SELECT * FROM VALUES
    (5,1), 
    (4,0)
),q2(startday, endday) as (
    SELECT * FROM VALUES
    (4,4), 
    (8,8)
), r2 as (
  select 3 as id,
    j.* 
  from q1 as j
  where (j.isok=0)
     or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
)
select 2 as level, 
    j.* 
from r2 as j

and sure enough the still problem happens.

LEVEL ID DAYID ISOK
2 3 5 1
2 3 4 0

which is the same as:

with q1(dayid, isok) as (
    SELECT * FROM VALUES
    (5,1), 
    (4,0)
),q2(startday, endday) as (
    SELECT * FROM VALUES
    (4,4), 
    (8,8)
)
select 3 as id,
j.* 
from q1 as j
where (j.isok=0)
 or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;

if we pull clause 1 j.isok=0 into the select as

select 3 as id,
    j.*,
    j.isok=0 as clause_1
from q1 as j
where (j.isok=0)
 or (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;

We get false for dayid 5 as expected so it's the exists providing that half.

if we comment out the OR

select 3 as id,
    j.*,
    j.isok=0 as clause_1
from q1 as j
where --(j.isok=0)
 --or 
 (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))

we get just the dayid 4 row:

So it seems the OR is has a bug??

if we change it to a LEFT JOIN:

select 3 as id,
    j.*,
    j.isok=0 as clause_1,
    k.*,
    k.startday<=j.dayid and j.dayid<=k.endday as clause_2
from q1 as j
LEFT JOIN q2 k 
    ON k.startday<=j.dayid and j.dayid<=k.endday
where clause_1 OR clause_2
;

we only get the dayid 4

if we push the exists into the SELECT

select 3 as id,
    j.*,
    j.isok=0 as clause_1,
    k.*,
    k.startday<=j.dayid and j.dayid<=k.endday as clause_2,
    (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday)) AS clause_2b
from q1 as j
LEFT JOIN q2 k 
    ON k.startday<=j.dayid and j.dayid<=k.endday
where clause_1 OR 
 (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))

we get the false we are expecting in the OR

ID DAYID ISOK CLAUSE_1 STARTDAY ENDDAY CLAUSE_2 CLAUSE_2B
3 4 0 TRUE 4 4 TRUE TRUE
3 5 1 FALSE FALSE

If we use CLAUSE_2B it works as expected:

select 3 as id,
    j.*,
    j.isok=0 as clause_1,
    (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday)) AS clause_2b
from q1 as j
where clause_1 OR clause_2b
ID DAYID ISOK CLAUSE_1 CLAUSE_2B
3 4 0 TRUE TRUE

So YES there is a bug in the handling of OR and EXISTS clauses in the where.. But there is a work-around to ether use a LEFT JOIN if the data is distinct, OR put the WHERE clause into the SELECT and then refer to that.. but still totally a bug, open a support case

with the cut down problem:

with q1(dayid, isok) as (
    SELECT * FROM VALUES
    (5,1), 
    (4,0)
),q2(startday, endday) as (
    SELECT * FROM VALUES
    (4,4), 
    (8,8)
)
select 3 as id,
    j.*,
    j.isok=0 as clause_1
from q1 as j
where clause_1 OR (exists (select 1 from q2 k where k.startday<=j.dayid and j.dayid<=k.endday))
;

Upvotes: 0

Related Questions