user1085351
user1085351

Reputation: 105

SQL Server - nvarchar(max) full text index useful when doing exact match?

I have a column in a table which is of type nvarchar(max) and there are some scenarios where I need to perform an exact match on the contents of that column.

I know that I can create a full text index which broadly speaking, as I understand it, tokenises the text allowing more efficient queries when wanting to search within the string. I was wondering whether, when performing an exact match, whether a full text index is actually of any use at all in terms of increasing performance?

Are there any better alternatives?

Upvotes: 1

Views: 2020

Answers (2)

Matt W
Matt W

Reputation: 31

I know this is an old question, and I would comment on JNK's answer but I don't have the rep to do so...

First, since you're using an Nvarchar, you've got to be very careful to make sure that strings which compare equal in your collation hash equally; unless you're using a binary collation, this won't happen unless your hash algorithm is Unicode-aware or you first normalize your strings. Unicode allows different representations of the same characters, for example É can be represented as codepoint U+00C9, or as codepoint U+0045 (E) followed by codepoint U+0301 (combining acute).

Second, a cryptographic hash algorithm like MD5 doesn't match up well with the need here, where you're hashing for performance not security. You don't need to spend that much CPU in every insert and at the start of every query, and you don't need your index key to be that large. What you want is almost the .NET StringComparer.GetHashCode() function, which is fast, accounts for characters that are logically but not binary equal, and generates a small hashcode which can therefore be compared extremely quickly. Sadly MS reserves the right to change that algorithm at will, which would screw up any stored hashes. If you're going CLR anyway, I'd probably recommend stealing the appropriate GetHashCode implementation from the Mono project - their class libraries are MIT-licensed so you can lift them at will as long as you preserve copyright notices in the source.

Upvotes: 3

JNK
JNK

Reputation: 65217

If all you need to check for is an exact match, you could create a computed column which is the hash of the nvarchar(max) field.

This would be small enough to be indexable but still would indicate if fields matched exactly or not.

The general idea would be:

ALTER TABLE MyTable
ADD HashField as HASHBYTES('MD5', LongfieldName)

Upvotes: 3

Related Questions