Reputation: 165
I came across strange behaviour, at least on SQL Server (I still need to check other SQL engines), while trying to delete some records.
I tested the following on SQL Servers instance:
Here is a SQL code snippet. This is simplified version of what I was trying to do, it is for sure discussable why to do a query this way, but my point is different -- why does it happen.
drop table if exists #A
drop table if exists #B
create table #B (id char, foo char) -- I use different types for id columns just for the sake of readability
create table #A (id int, b_id char) -- b_id is a link to #A, _not_ specified as FK
insert into #B values('x', 'l')
insert into #B values('y', 'm')
insert into #A values(0, 'x')
insert into #A values(1, 'z')
insert into #A values(2, 'z')
insert into #A values(3, 'y')
insert into #A values(4, 'y')
insert into #A values(5, 'y')
-- there are 2 parent records in #B and 3 child records for each of them in #A
select * from #A -- just to check, all good the data is there, all as expected, no problem
-- now the fun part
--(X) the following query, rightfully gives an error, as column b_id does not exist in #B
-- select b_id from #B where foo='l'
--(Y) the following query gives empty result, whereas I would expect an error:
select * from #A where b_id in (select b_id from #B where foo='l')
-- I can imagine that this has something to do with the fact that b_id exists in table #A in outer query
--(Z) the following query deletes(!) all data in table #A:
delete from #A where b_id in (select b_id from #B where foo='l')
select * from #A
-- once again, I can imagine that there is no error message "Invalid column name 'b_id'." because b_id exists in table #A in outer query
So here are my questions:
Upvotes: 1
Views: 421
Reputation: 95052
You are right about #A.b_id being the culprit. The query
select * from #A where b_id in (select b_id from #B where foo='l')
translates to
select * from #A where b_id in (select #A.b_id from #B where #B.foo='l')
Let's say that #B has no row with foo = 'l'. Then the subquery returns no rows and the whole query doesn't return any row, because the condition becomes where b_id in (<nothing>)
and is not met.
If, on the other hand, #B has rows with foo = 'l', the subquery returns #A.b_id for every such row. In that case the query returns all rows, because the condition becomes where b_id in (b_id, b_id, b_id, ...)
and is met.
So, it seems when trying (Y) there was no #B row with foo = 'l', but when trying (Z) there was.
Upvotes: 1
Reputation: 1608
Question 1 (Y):
The (live and actual) execution plan shows that your imagination was correct: the output from each node is shown, including the table prefix - no confusion possible:
The node 3 returns the following two columns: [#A].[id], [#A].[b_id]
, while node 4 has nothing to return but NULL
/*
id b_id
----------- ----
0 x
1 z
2 z
3 y
4 y
5 y
(6 rows affected)
Table '#B__________________________________________________________________________________________________________________000000000335'. Scan count 1, logical reads 6, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#A__________________________________________________________________________________________________________________000000000336'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Rows Executes StmtText NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type EstimateExecutions
---- -------- ------------------------------------------------------------------------------------ ------ ------ ------------ -------------- -------------------------------------------------------------- ----------------------- ------------- ------------- ------------- ----------- ---------------- ----------------------- -------- --------- ------------------
6 1 select * from #A where b_id in (select b_id from #B where foo='l') 1 0 NULL NULL NULL NULL 6 NULL NULL NULL 0,00701002 NULL NULL SELECT NULL
6 1 |--Nested Loops(Left Semi Join) 2 1 Nested Loops Left Semi Join NULL NULL 6 0 2,508E-05 12 0,00701002 [#A].[id], [#A].[b_id] NULL PLAN_ROW 1
6 1 |--Table Scan(OBJECT:([tempdb].[dbo].[#A]), WHERE:([#A].[b_id]=[#A].[b_id])) 3 2 Table Scan Table Scan OBJECT:([tempdb].[dbo].[#A]), WHERE:([#A].[b_id]=[#A].[b_id]) [#A].[id], [#A].[b_id] 6 0,003125 0,0001636 12 0,0032886 [#A].[id], [#A].[b_id] NULL PLAN_ROW 1
6 6 |--Table Scan(OBJECT:([tempdb].[dbo].[#B]), WHERE:([#B].[foo]='l')) 4 2 Table Scan Table Scan OBJECT:([tempdb].[dbo].[#B]), WHERE:([#B].[foo]='l') NULL 1 0,0032035 8,07E-05 9 0,0036877 NULL NULL PLAN_ROW 6
(4 rows affected)
*/
Question 2: The query does return results.
Question 3:
SELECT * FROM #A WHERE b_id IN (SELECT b_id FROM #B WHERE foo = 'l');
SELECT * FROM #A WHERE EXISTS (SELECT 'YAAAY' FROM #B WHERE foo = 'l');
SELECT * FROM #A WHERE EXISTS (SELECT 'YAAAY' FROM #B WHERE foo = 'asdafadsf');
I think that the OP's query can be rewritten into an equivalent EXISTS query.
The b_id
will always be equal to b_id
, except when there are no results returned from the IN ()
part.
Upvotes: 1
Reputation: 1560
The query
select * from #A where b_id in (select b_id from #B where foo='l')
uses the b_id from #A in the subquery.
If you want to produce an error you would need to add the table name:
select * from #A where b_id in (select #B.b_id from #B where foo='l')
This will give you: "Invalid column name 'b_id'."
I can't reproduce your case 'Y' with the given script. Both queries return the same result:
select * from #A
select * from #A where b_id in (select b_id from #B where foo='l')
Upvotes: 1