Pete Amundson
Pete Amundson

Reputation: 900

SQL Server - Is there an easy way to ignore quotes when searching?

I have a site where I need to be able to search for data and have the query ignore all quotes.

  1. Search for don't, don’t or dont and retrieve results for rows that have words that start with: don't, don’t or dont
  2. Search for "hello" or “hello” or hello and retrieve results for rows that have words that start with: "hello", “hello” or hello

Note: I already am stripping out the quotes for the passed in search term

I want to know if there is an easier (or less verbose) method than:

select Name 
  from tbl_MyTable
 where (Replace(Replace(Replace(Replace(Replace(Replace(Name,'“',''),'‘',''),'''',''),'"',''),'’',''),'”','') like 'dont%' 
    or Replace(Replace(Replace(Replace(Replace(Replace(Name,'“',''),'‘',''),'''',''),'"',''),'’',''),'”','') like '% dont%' );

Right now, my best idea is to create a new column that contains the quote-stripped version (prepended with a space) so that I can just do:

select Name 
  from tbl_MyTable
 where FixedName like '% dont%';

But I would really like to know if this can be accomplished without creating a new column and have it be efficient.

Upvotes: 6

Views: 687

Answers (5)

Dave Hilditch
Dave Hilditch

Reputation: 5439

Use a fulltext index instead of LIKE.

Create your fulltext index:

http://msdn.microsoft.com/en-us/library/ms187317.aspx

CREATE UNIQUE INDEX ix1 ON tbl_MyTable(YourKey); //unique index required
CREATE FULLTEXT CATALOG ft AS DEFAULT; // ft is your freetext catalog name
CREATE FULLTEXT INDEX ON tbl_MyTable(Name) 
   KEY INDEX ix1
   WITH STOPLIST = SYSTEM; // this is your index and allows you to run the command below

Then use this to run your query:

SELECT Name 
FROM tbl_MyTable 
WHERE FREETEXT(Name, 'dont');

That's the fastest technique for this kind of thing. You can get even faster if you use third party free-text engines but there's probably no need for that.

Upvotes: 1

Nacho
Nacho

Reputation: 153

Try the following to return all names back without any quotes or double quotes. This will prevent the necessity of the LIKE statements, avoid the necessity for another column, and speed up your query:

SELECT Replace(
    Replace(
        Replace(
            Replace(
                Replace(
                    Replace( Name, '“', ''),
                '‘', ''),
            '''',''),
        '"', ''),
     '’',''),
'”', '') AS Name 
FROM tbl_MyTable

Upvotes: 0

Matt
Matt

Reputation: 4785

Efficient in space or in time?

Your first solution is space-efficient, but likely time-inefficient due to applying multiple string functions to each row in the table, every time the query is executed.

The solution of a generated column is space-inefficient, but likely time-efficient due to applying the string operations once (when you add the column and then at insert/update).

From the perspective of your users, the best solution is likely to be searches performed on a generated column.

Upvotes: 0

Michael Fredrickson
Michael Fredrickson

Reputation: 37398

I would suggest creating a user defined function to consolidate this logic:

CREATE FUNCTION [dbo].[udf_StripQuotes]
(
    @String VARCHAR(MAX)
)  
RETURNS VARCHAR(MAX)
AS  
BEGIN 
    RETURN Replace(
        Replace(
            Replace(
                Replace(
                    Replace(
                        Replace(@String,'“',''),
                    '‘',''),
                '''',''),
            '"',''),
        '’',''),
    '”','')
END
GO

Which then looks like:

select Name 
from tbl_MyTable
where dbo.udf_StripQuotes(name) like '% dont%';

As far as efficiency, the leading and trailing % in your like statement will prevent you from using any indexes, which will cause a full table scan... this is probably the greatest performance hit to this query.

However, like Aaron clarifies, this implementation will be slower than the original due to the overhead of calling the UDF.

If you can avoid the leading wildcard, then a computed column with an index would likely improve performance.

Otherwise, I think your only other option would be to implement Full-Text Search.

Upvotes: 0

Aaron Bertrand
Aaron Bertrand

Reputation: 280431

This isn't an answer to the question, but will be really hard to implement as a comment.

If you're going to use a UDF to simplify the query itself, do yourself a favor and limit the function calls to the number of rows you have, not double. Instead of:

 where dbo.udf_StripQuotes(name) like 'dont%' 
  or dbo.udf_StripQuotes(name) like '% dont%' );

Do this:

 where ' ' +  dbo.udf_StripQuotes(name) like '% dont%';

As far as the underlying problem, I agree with Michael that an indexed computed column might be best, but this won't be possible if the name column exceeds 900 bytes (and this won't magically turn a scan to a seek, due to the wildcards, it just removes the need to call a function or perform all those replace calls in the query).

Upvotes: 0

Related Questions