Reputation: 31

Strange behavior with SQL server IN clause

I am wondering why the below SQL statement behaves the way that it does:

select * 
from tableA 
where document_id in (select document_id
                      from  tableB 
                      where batch_id = 99997)

tableA contains the column document_id but tableB does not, so the query appears to return all rows in tableA, this behavior occurs if you use any field name in the select statement of the IN clause that is a field in tableA. Using a name not in tableA or tableB does result in an error.

Upvotes: 3

Answers (3)

Mike T

Reputation: 59

I've just found this behaviour myself after many years. I use Exists() instead of the IN() clause except for ad-hoc and non data modifying queries because of unexpected results in the past with null values, but this had me flummoxed until I read DavidG's response. There is no reason to write a query in such a manner, referencing an outer column without referencing the subqueries table at all but a syntax checker would have trouble disallowing it.

Mostly if using IN() with a sub query you would be referencing a foreign key which, 95+% of the time (with our company standards), has the same name as the primary key, but those 5% of times could catch you out.

Although it seems this behavior also occurs with Exists()...

Select * from tableA Where Exists(Select document_id from tableB)

...Exists() is never written this way because it makes no sense but is written more explicitly something like...

Select * From tableA Where Exists(Select * from tableB tB where tB.TableB_Document_id = tableA.document_id)

and so less likely to occur.

(Note: I would've put this in a comment but I don't have the points)

Upvotes: 0

arserbin3

Reputation: 6148

This is why I'd suggest you get in the habit of putting the explicit table name for each column. It is also helpful for maintainability or later extending queries.

select * 
from tableA A
where A.document_id in (select B.document_id
                  from  tableB B
                  where B.batch_id = 99997)

If you qualify the tables like this, it will throw a clear error on runtime and prevent any subtle mistakes. This would give the error similar to

tableB.document_id does not exist.

Upvotes: 3

DavidG

Reputation: 118937

It's not an error. In a subquery, you can still use columns form the parent. So when you say

SELECT document_id FROM tableB WHERE batch_id = 99997

You are saying for every row where batch_id is 9997 in tableB, select document_id from tableA. And of course, all those document_id values exists, so it will return all those rows.

Upvotes: 5

Strange behavior with SQL server IN clause

Answers (3)

Related Questions