Index on a column with like and where pattern?

I've came to this conclusion that Like in below cases does seek/scan But I didn't get Why does it scan in 1st case and Seek in 2nd case. I understood the 3rd case.

SELECT c.contactname FROM Sales.Customers c
WHERE c.contactname LIKE '%a'-- Does a Scan 1st Case

SELECT c.contactname FROM Sales.Customers c
WHERE c.contactname LIKE 'a%'-- Does a Seek 2nd Case

SELECT c.contactname FROM Sales.Customers c
WHERE c.contactname LIKE '%a%'-- Does a Scan

If i build a index on contactname which has below sample data how would be the index tree.. like we if build for numbers it will compare less than greater than and will do traversing how index tree will be traverse in below case.

c.contactname

mark
anna
krishna
nadejda
allen
bob
cab

Upvotes: 5

Answers (3)

Zane

Reputation: 4169

Ok so to break this down for you when you create your index on your character string column. It will store them by the first letter so your data is stored like this.

allen
anna
bob
cab
krishna
mark
nadejda

So when you are doing your LIKE 'a%' Sql server can use the index to pare down the results by looking through the order for everything with an 'A' in the begining. However once you have placed your wildcard at the end LIKE '%A' you have now forced the engine to scan through the entire table to look for anything ending with the letter A.

Of course this is an incredibly simplified explanation.

Upvotes: 4

Esoteric Screen Name

Reputation: 6112

It does a scan in the first case for the same reason as in the third case: the index searches based on data from the start of the string. With a wildcard at the start of your pattern, there's no way to intelligently search the index for a match. The index does a lexographical (alphabetical) comparison for strings, which is a less than/greater than comparison.

The tree might look something like this:

        /nadejda
    mark
   /    \krishna
cab
   \    /bob
    anna
        \allen

So, searching for a% can be done, as the index will know where to go at each branch. E.g., C > A, so go left. Searching for %a isn't efficiently possible. The index would have to read all the data to figure out if each node ended with A. Having to read all the data means using the index is just wasted overhead.

Upvotes: 4

Gordon Linoff

Reputation: 1270713

This is, indeed, the way that SQL Server works. You are asking why.

Think about the index. Functionally, you can think of it as storing the contactname in alphabetical order -- much like a dictionary or telephone book. (Yes, it is usually a more complicated data structure, typically a B-tree, but the result is the items in order).

When you say contactname like 'a%', then the query optimizer knows that it only needs to look at entries that start with the letter "a". The index knows exactly where those are, so the optimizer can use a seek to get to them. SQL Server implements this optimization for like (not all databases do this).

When you say contactname like '%a', you are saying "find me the entries that end in 'a'". That is a lot like looking through a dictionary to get all the words that end in "a". The ordering is not of any help. There might be entries that start with "a" and end with "a'. There might be entries that start with "z" and end with "a". So, these types of expressions require a scan instead of a seek.

Upvotes: 4

Index on a column with like and where pattern?

Answers (3)

Related Questions