Reputation: 28875

In SQL Server, is a sort by short substring more efficient than a sort by the entire, long field?

Consider the following SQL Server 2005/2008 query:

Select [UID], [DESC] From SomeTable Order By [Desc];

If Desc is a fairly long field (Varchar(125) with many entries > 70 chars) and you don't need a strict sorting, would it would be more efficient to do this:

Select [UID], [DESC] From SomeTable Order By Substring([Desc], 0, 20);

The advantage is that all comparisons are pretty short (20 characters, max). The disadvantage is that it incurs the Substring call. For present purposes, assume that you don't want to put an index on this field as this is not a primary key and the above is a fairly rare operation. Which option would you choose?

Note 2: I'm asking mostly out of curiosity here. In my application, Desc is an indexed field and I am not using Substring. However, I briefly considered using Substring and it occurred to me that I didn't truly know which of the above approaches would be more efficient.

Finaly, a bonus question: is it true that using Substring on an Indexed field would make the optimizer skip the index and really slow things down? I don't think the optimizer is smart enough to use the Index if Substring is used (even with a zero base) but I am a bit too busy to test it out right now. However, if you know differently, please correct me!

Update/clarification: you should be assuming that the Desc field is not indexed for purposes of the original question. If it is indexed, the answer is pretty easy.

Upvotes: 2

Answers (5)

Kibbee

Reputation: 66142

Something you might want to consider is this. When sorting strings, assuming good optimized algorithms, you don't have to analyze the entire string to find out which string comes first. Consider the two strings

F3294r02343232423
B3920490234324234

You only have to analyze the first character of each before you know that the second string should come first. I'm not sure how much this comes into play with your specific data set, but it's something you should think about.

As a test, you may want to create a copy of your table with the exact same data and indexes, but truncate the field you are sorting on to 20 characters, and see if there is any noticeable increase in speed due to the smaller amount of data. If there is a significant performance increase, you may want to go with what Robert wrote, and create a second column with the data already truncated so you don't have to use the substring function.

Upvotes: 1

Robert

Reputation: 1845

Your last part is completely true.

As for the sorting issue whether it's quicker to sort on a substring of the first 20 characters. If the string is say 30 characters the answer is no, if 300 characters then maybe yes. I don't know where the boundary would lie. But it will go through character by character sorting. If 21 characters it's quicker to not have the extra overhead of the substring and let it check the extra 1 character.

What you could do is have a further column which is a truncated description and sort on this instead.

Upvotes: 1

Quassnoi

Reputation: 425713

Use of a non-clustered index implies an implicit JOIN.

The index itself does not contain the non-indexed values, it contains only references to the TABLE's blocks.

To get the non-indexed values, you need to scan over the index and read from these blocks in a nested loop.

As a rule of thumb, INDEX SCAN WITH TABLE LOOKUP is about 10 times as costly as the TABLE SCAN.

If you need all the results of an ordered query, especially as a part or a more complex query implying the nested loops, it's sometimes more efficient to perform a TABLE SCAN and sort the results.

Table needs to be sorted only once and results of the sort will be kept and reused. In this case, SUBSTRING may be more efficient.

If you need 5% of ordered results or less, then the INDEX SCAN will be more efficient, in this case you need to sort on the whole column.

Also, index lookup is always more responsive, as you get the first rows faster.

Upvotes: 1

Kristen

Reputation: 4301

You've said to ignore the fact that [Desc] is indexed, however given that you say that it is indexed and assuming [UID] is the PK and using a Clustered Index, your query is "covered" by the index on [Desc], and thus SQL is going to read the records in index order ... so putting a SUBSTRING will cause it to have to take an extra step to sort by first 20 characters, whereas they were already read in sorted order

is it true that using Substring on an Indexed field would make the optimizer skip the index and really slow things down

Generally yes, if the field is in the WHERE clause. Any function applied to a field in the where clause is likely to cause the optimiser to skip the indexes. Generally speaking.

Upvotes: 0

Otávio Décio

Reputation: 74290

I don't think so. Calling the function will cause the most damage to the performance in this case. And yes, functions are very likely to make the optimizer avoid indexes.

Upvotes: 0

In SQL Server, is a sort by short substring more efficient than a sort by the entire, long field?

Answers (5)

Related Questions