Matt Grum
Matt Grum

Reputation: 21

efficiency of SQL 'LIKE' statement with large number of clauses

I need to extract information from a text field which can contain one of many values. The SQL looks like:

SELECT fieldname
FROM table
WHERE bigtextfield LIKE '%val1%'
   OR bigtextfield LIKE '%val2%'
   OR bigtextfield LIKE '%val3%'
   .
   .
   .
   OR bigtextfield LIKE '%valn%'

My question is: how efficient is this when the number of values approaches the hundreds, and possibly thousands? Is there a better way to do this?

One solution would be to create a new table/column with just the values I'm after and doing the following:

SELECT fieldname
FROM othertable
WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

Which I imagine is a lot more efficient as it only has to do exact string matching. The problem with this is that it will be a lot of work keeping this table up to date.

btw I'm using MS SQL Server 2005.

Upvotes: 2

Views: 4073

Answers (4)

HLGEM
HLGEM

Reputation: 96552

The best solution for this is to redesign and get rid of that field that is storing multiple values and make it a related table instead. This violates one of the first rules of database design.

You should not be storing multiple values in one field and dead slow queries like this are the reason why. If you can't do that then full-test indexing is your only hope.

Upvotes: 1

intgr
intgr

Reputation: 20456

This functionality is already present in most SQL engines, including MS SQL Server 2005. It's called full-text indexing; here are some resources:

Upvotes: 5

marc_s
marc_s

Reputation: 754230

I don't think the main problem is the number of criteria values - but the sheer fact that a WHERE clause with bigtextfield LIKE '%val1%' can never really be very efficient - even with just a single value.

The trouble is the fact that if you have a placeholder like "%" at the beginning of your search term, all the indices are out the window and cannot be used anymore.

So you're basically just searching each and every entry in your table doing a full table scan in the process. Now your performance basically just depends on the number of rows in your table....

I would support intgr's recommendation - if you need to do this frequently, have a serious look at fulltext indexing.

Upvotes: 2

Quassnoi
Quassnoi

Reputation: 425251

This will inevitable require a fullscan (over the table or over an index) with a filter.

The IN condition won't help here, since it does not work on LIKE

You could do something like this:

SELECT  *
FROM    master
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    values
        WHERE   name LIKE '%' + value + '%'
        )

, but this hardly be more efficient.

All literal conditions will be transformed to a CONSTANT SCAN which is just like selecting from the same table, but built in memory.

Upvotes: 1

Related Questions