JavaNoScript
JavaNoScript

Reputation: 2413

What does a SQL SELECT statement actually do during execution?

In a SELECT statement :

SELECT name 
FROM users 
WHERE address IN (addr_a, addr_b, addr_c, ...);

We know that it will select all person's names whose address is in (addr_a, addr_b, addr_c, ...). But I want to know what it actually do when executing this statement.

For example, does it search every element in the table to check if its address is in (addr_a, ...) ?

If addr_a, addr_b is too long, does it slow down the search process?

Is there any material about these stuff to be recommended ?

Edit: I didn't specify a RDBMS, because I would like to know as many SQL implementations as possible.

Edit again: Here I got answers about MySQL and SQL Server and I accepted the "SQL Server" one as it's a detailed answer. Welcome for more answers about other RDBMS.

Upvotes: 0

Views: 1636

Answers (3)

Nenad Zivkovic
Nenad Zivkovic

Reputation: 18559

Since you haven't specified which RDBMS are your question about, I am going to write how it works on SQL Server, trying to simplify it a bit and avoid much of technicalities. It might be same or very similar on different systems, but it also might be completely different.

What SQL Server is going to do with your query

`SELECT name FROM users WHERE address IN (addr_a, addr_b, addr_c, ...);`

depends almost entirely on what kind of indexes do you have on a table. Here are a 3 basic scenarios:

Scenario 1 (good index)

If you have what is called Covering Index, which would mean either a PK or clustered index on column address or non-clustered index on address which include name, SQL Server will do something called Index Seek. It means it will go through index's tree structure and quickly pinpoint the exact row you need (or find it's not existing). Since name column is also included in index, it will read it and return right from there.

Scenario 2 (not-so-good index)

This is the case when you have index on column address, which does not include column name. You might find these kind of indexes - on only one column - very often, but as you'll find out soon they are pretty useless most of the time. What you are hoping here that SQL Server goes through your index structure (seek) and quickly finds the row with your address. However as column name is not there now, it can only get rowID (or PK) where the row actually is, so it will for each row returned do additional reading of another index or table to find your row and retrieve name. Since that takes 3 times more reading then scenario 1, SQL Server will more often then not decide that it's cheaper to just go through all rows of table rather than to use your index. And that is explained in scenario 3.

Scenario 3 (no usable index)

This will happen if you don't have indexes at all or no indexes on column address. Simply speaking SQL Server goes through all the rows and check every row for your condition. This is called Index Scan (or Table Scan if there are no indexes at all). Usually the worst case scenario and slowest at all.

Hopes that helps to clarify things a bit.

As for the other sub-question about long string slowing down - the answer for this case would be 'probably not much'. When SQl Server compares two strings, it goes character-by-character, so if the first letters of both strings are different, it will not check further. However if you put a wildcard % on beginning of your string ie: WHERE address LIKE '%addr_a' SQL Server will have to check every character of every string in column and therefore work much slower.

Upvotes: 3

Sukane
Sukane

Reputation: 2680

@Xu : An execution plan is created for the select query and based on that plan the final execution is done. Please check this basic documentation related to Execution Plan for more details.

Upvotes: 1

Explosion Pills
Explosion Pills

Reputation: 191749

The documentation explains exactly what it does.

If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search.

Therefore the order of the arguments actually doesn't matter as MySQL sorts them for comparison anyway.

Upvotes: 2

Related Questions