Reputation: 2413
In a SELECT
statement :
SELECT name
FROM users
WHERE address IN (addr_a, addr_b, addr_c, ...);
We know that it will select all person's names whose address is in (addr_a, addr_b, addr_c, ...). But I want to know what it actually do when executing this statement.
For example, does it search every element in the table to check if its address is in (addr_a, ...) ?
If addr_a
, addr_b
is too long, does it slow down the search process?
Is there any material about these stuff to be recommended ?
Edit: I didn't specify a RDBMS, because I would like to know as many SQL implementations as possible.
Edit again: Here I got answers about MySQL and SQL Server and I accepted the "SQL Server" one as it's a detailed answer. Welcome for more answers about other RDBMS.
Upvotes: 0
Views: 1636
Reputation: 18559
Since you haven't specified which RDBMS are your question about, I am going to write how it works on SQL Server, trying to simplify it a bit and avoid much of technicalities. It might be same or very similar on different systems, but it also might be completely different.
What SQL Server is going to do with your query
`SELECT name FROM users WHERE address IN (addr_a, addr_b, addr_c, ...);`
depends almost entirely on what kind of indexes do you have on a table. Here are a 3 basic scenarios:
Scenario 1 (good index)
If you have what is called Covering Index, which would mean either a PK or clustered index on column address
or non-clustered index on address
which include name
, SQL Server will do something called Index Seek
. It means it will go through index's tree structure and quickly pinpoint the exact row you need (or find it's not existing). Since name
column is also included in index, it will read it and return right from there.
Scenario 2 (not-so-good index)
This is the case when you have index on column address
, which does not include column name
. You might find these kind of indexes - on only one column - very often, but as you'll find out soon they are pretty useless most of the time. What you are hoping here that SQL Server goes through your index structure (seek) and quickly finds the row with your address. However as column name
is not there now, it can only get rowID (or PK) where the row actually is, so it will for each row returned do additional reading of another index or table to find your row and retrieve name. Since that takes 3 times more reading then scenario 1, SQL Server will more often then not decide that it's cheaper to just go through all rows of table rather than to use your index. And that is explained in scenario 3.
Scenario 3 (no usable index)
This will happen if you don't have indexes at all or no indexes on column address. Simply speaking SQL Server goes through all the rows and check every row for your condition. This is called Index Scan
(or Table Scan
if there are no indexes at all). Usually the worst
case scenario and slowest at all.
Hopes that helps to clarify things a bit.
As for the other sub-question about long string slowing down - the answer for this case would be 'probably not much'. When SQl Server compares two strings, it goes character-by-character, so if the first letters of both strings are different, it will not check further. However if you put a wildcard % on beginning of your string ie: WHERE address LIKE '%addr_a'
SQL Server will have to check every character of every string in column and therefore work much slower.
Upvotes: 3
Reputation: 2680
@Xu : An execution plan is created for the select query and based on that plan the final execution is done. Please check this basic documentation related to Execution Plan for more details.
Upvotes: 1
Reputation: 191749
The documentation explains exactly what it does.
If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search.
Therefore the order of the arguments actually doesn't matter as MySQL sorts them for comparison anyway.
Upvotes: 2