rezdm
rezdm

Reputation: 165

No error while executing a query with wrong column name

I came across strange behaviour, at least on SQL Server (I still need to check other SQL engines), while trying to delete some records.

I tested the following on SQL Servers instance:

Here is a SQL code snippet. This is simplified version of what I was trying to do, it is for sure discussable why to do a query this way, but my point is different -- why does it happen.

    drop table if exists #A
    drop table if exists #B

    create table #B (id char, foo char) -- I use different types for id columns just for the sake of readability
    create table #A (id int, b_id char) -- b_id is a link to #A, _not_ specified as FK

    insert into #B values('x', 'l')
    insert into #B values('y', 'm')

    insert into #A values(0, 'x')
    insert into #A values(1, 'z')
    insert into #A values(2, 'z')
    insert into #A values(3, 'y')
    insert into #A values(4, 'y')
    insert into #A values(5, 'y')

    -- there are 2 parent records in #B and 3 child records for each of them in #A
    select * from #A -- just to check, all good the data is there, all as expected, no problem

    -- now the fun part

    --(X) the following query, rightfully gives an error, as column b_id does not exist in #B
    -- select b_id from #B where foo='l'

    --(Y) the following query gives empty result, whereas I would expect an error:
    select * from #A where b_id in (select b_id from #B where foo='l')
    -- I can imagine that this has something to do with the fact that b_id exists in table #A in outer query

    --(Z) the following query deletes(!) all data in table #A:
    delete from #A where b_id in (select b_id from #B where foo='l')
    select * from #A
    -- once again, I can imagine that there is no error message "Invalid column name 'b_id'." because b_id exists in table #A in outer query

So here are my questions:

  1. Why there is no error message about invalid column in queries (Y) and (Z)? I would be interested in details
  2. Depending on the answer (1), would be interesting to know why query (Y) provides empty result. It is clear that if inner select is empty, then outer should be empty, but devil hides in details
  3. Why query (Z) deletes all records from table #A? I would expect that affected records (returned in case of (Y) and deleted in case of (Z)) should be the same

Upvotes: 1

Views: 421

Answers (3)

Thorsten Kettner
Thorsten Kettner

Reputation: 95052

You are right about #A.b_id being the culprit. The query

select * from #A where b_id in (select b_id from #B where foo='l')

translates to

select * from #A where b_id in (select #A.b_id from #B where #B.foo='l')

Let's say that #B has no row with foo = 'l'. Then the subquery returns no rows and the whole query doesn't return any row, because the condition becomes where b_id in (<nothing>) and is not met.

If, on the other hand, #B has rows with foo = 'l', the subquery returns #A.b_id for every such row. In that case the query returns all rows, because the condition becomes where b_id in (b_id, b_id, b_id, ...) and is met.

So, it seems when trying (Y) there was no #B row with foo = 'l', but when trying (Z) there was.

Upvotes: 1

milivojeviCH
milivojeviCH

Reputation: 1608

Question 1 (Y):

The (live and actual) execution plan shows that your imagination was correct: the output from each node is shown, including the table prefix - no confusion possible: The node 3 returns the following two columns: [#A].[id], [#A].[b_id], while node 4 has nothing to return but NULL

/*

         id b_id
----------- ----
          0 x
          1 z
          2 z
          3 y
          4 y
          5 y

(6 rows affected)

Table '#B__________________________________________________________________________________________________________________000000000335'. Scan count 1, logical reads 6, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table '#A__________________________________________________________________________________________________________________000000000336'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Rows  Executes StmtText                                                                             NodeId Parent PhysicalOp   LogicalOp      Argument                                                       DefinedValues            EstimateRows    EstimateIO   EstimateCPU  AvgRowSize TotalSubtreeCost OutputList              Warnings Type      EstimateExecutions
----  -------- ------------------------------------------------------------------------------------ ------ ------ ------------ -------------- -------------------------------------------------------------- ----------------------- ------------- ------------- ------------- ----------- ---------------- ----------------------- -------- --------- ------------------
   6         1 select * from #A where b_id in (select b_id from #B where foo='l')                        1      0 NULL         NULL           NULL                                                           NULL                                6          NULL          NULL        NULL       0,00701002 NULL                    NULL     SELECT                  NULL
   6         1   |--Nested Loops(Left Semi Join)                                                         2      1 Nested Loops Left Semi Join NULL                                                           NULL                                6             0     2,508E-05          12       0,00701002 [#A].[id], [#A].[b_id]  NULL     PLAN_ROW                   1
   6         1        |--Table Scan(OBJECT:([tempdb].[dbo].[#A]), WHERE:([#A].[b_id]=[#A].[b_id]))       3      2 Table Scan   Table Scan     OBJECT:([tempdb].[dbo].[#A]), WHERE:([#A].[b_id]=[#A].[b_id])  [#A].[id], [#A].[b_id]              6      0,003125     0,0001636          12        0,0032886 [#A].[id], [#A].[b_id]  NULL     PLAN_ROW                   1
   6         6        |--Table Scan(OBJECT:([tempdb].[dbo].[#B]), WHERE:([#B].[foo]='l'))                4      2 Table Scan   Table Scan     OBJECT:([tempdb].[dbo].[#B]), WHERE:([#B].[foo]='l')           NULL                                1     0,0032035      8,07E-05           9        0,0036877 NULL                    NULL     PLAN_ROW                   6

(4 rows affected)

*/

Execution Plan - Node 4

Question 2: The query does return results.

Question 3:

SELECT * FROM #A WHERE b_id IN (SELECT b_id    FROM #B WHERE foo = 'l');
SELECT * FROM #A WHERE  EXISTS (SELECT 'YAAAY' FROM #B WHERE foo = 'l');
SELECT * FROM #A WHERE  EXISTS (SELECT 'YAAAY' FROM #B WHERE foo = 'asdafadsf');

I think that the OP's query can be rewritten into an equivalent EXISTS query.

The b_id will always be equal to b_id, except when there are no results returned from the IN () part.

Query3-EXISTS-vs-IN

Upvotes: 1

TomStroemer
TomStroemer

Reputation: 1560

The query

select * from #A where b_id in (select b_id from #B where foo='l')

uses the b_id from #A in the subquery.

If you want to produce an error you would need to add the table name:

select * from #A where b_id in (select #B.b_id from #B where foo='l')

This will give you: "Invalid column name 'b_id'."

I can't reproduce your case 'Y' with the given script. Both queries return the same result:

select * from #A
select * from #A where b_id in (select b_id from #B where foo='l')

Upvotes: 1

Related Questions