Reputation: 948
I have a scenario where I need to perform following operation:
SELECT *
FROM
[dbo].[MyTable]
WHERE
[Url] LIKE '%<some url>%';
I have to use two % (wildcard characters) at the beginning and the end of Url ('%<some url>%'
) as user should be able to search the complete url even if he types partial text. For example, if url is http://www.google.co.in and user types "goo", then the url must appear in search results. LIKE
operator is causing performance issues. I need an alternative so that I can get rid of this statement and wildcards. In other words, I don't want to use LIKE statement in this scenario. I tried using T-SQL CONTAINS
but it is not solving my problem. Is there any other alternative available than can perform pattern matching and provide me results quickly?
Upvotes: 4
Views: 50791
Reputation: 45106
Starting a like with a % is going to cause a scan. No getting around it. It has to evaluate every value.
If you index the column it should be an index (rather than table) scan.
You don't have an alternative that will not cause a scan.
Charindex and patindex are alternatives but will still scan and not fix the performance issue.
Could you break the components out into a separate table?
www
google
co
in
And then search on like 'goo%'?
That would use an index as it does not start with %.
Better yet you could search on 'google' and get an index seek.
And you would want to have the string unique in that table with a separate join on Int PK so it does not return multiple www for instance.
Suspect FullText Contains was not faster because FullText kept the URL as one word.
Upvotes: 9
Reputation: 95101
Your query is a very simple one and I see no reason for it to be slow. The dbms wil read record for record and compare strings. Usually it can even do this in parallel threads.
What do you think can be the reason for your statement being so slow? Are there billions of records in your table? Do your records contain so much data?
Your best bet is not to care about the query, but about the database and your system. Others have already suggested an index on the url column, so rather than scanning the table, the index can be scanned. Is max degree of parallelism mistakenly set? Is your table fragmented? Is your hardware appropriate? These are the things to consider here.
However: charindex('oogl', url) > 0
does the same as url like '%oogl%'
, but internally they work differently somehow. For some people the LIKE expression turned out faster, for others the CHARINDEX method. Maybe it depends on the query, number of processors, operating system, whatever. It may be worth a try.
Upvotes: 2
Reputation: 24490
To my knowledge there's no alternative to like
or contains
(full text search feature) which would give better performance.
What you can do is try to improve performance by optimising your query.
To do that, you need to know a bit about your users & how they'll use your system.
I suspect most people will enter a URL from the start of the address (i.e. without protocol), so you could do something like this:
declare @searchTerm nvarchar(128) = 'goo'
set @searchTerm = coalesce(replace(@searchTerm ,'''',''''''),'')
select @searchTerm
SELECT *
FROM [dbo].[MyTable]
WHERE [Url] LIKE 'http://' + @searchTerm + '%'
or [Url] LIKE 'https://' + @searchTerm + '%'
or [Url] LIKE 'http://www.' + @searchTerm + '%'
or [Url] LIKE 'https://www.' + @searchTerm + '%'
or [Url] LIKE '%' + @searchTerm + '%'
option (fast 1); --get back the first result asap;
That then gives you some optimisation; i.e. if the url's http://www.google.com the index on the url column can be used since http://www.goo is at the start of the string.
The option (fast 1)
piece on the end's to ensure this benefit is seen; since the last URL like %searchTerm%
can't make use of indexes, we'd rather return responses as soon as we can rather than wait for that slow part to complete.
Have a think of other common usage patterns and ways around those.
Upvotes: 1
Reputation: 119186
You could create a FULLTEXT index.
First create your catalog:
CREATE FULLTEXT CATALOG ft AS DEFAULT;
Now assuming your table is called MyTable
, the column is TextColumn
and it has a unique index on it called UX_MyTable_TextColumn
:
CREATE FULLTEXT INDEX ON [dbo].[MyTable](TextColumn)
KEY INDEX UX_MyTable_TextColumn
Now you can search the table using CONTAINS:
SELECT *
FROM MyTable
WHERE CONTAINS(TextColumn, 'searchterm')
Upvotes: 1
Reputation: 28809
As written, your query cannot be further optimized, and there is no way of getting around the LIKE
to do your searching. The only thing you can do to improve performance is reduce the SELECT
to return only the columns you need if you don't need all of them, and create an index on URL
with those columns included. The LIKE
will not be able to use the index for seeking, but the reduced data size for scanning can help. If you have a SQL Server edition that supports compression, that will help as well.
For instance, if you really need only column A, write
SELECT A FROM [dbo].[MyTable] WHERE [Url] LIKE '%<some url>%';
And create the index as
CREATE INDEX IX_MyTable_URL
ON MyTable([Url])
INCLUDE (A) WITH (DATA_COMPRESSION = PAGE);
If A is already included in your primary key, the INCLUDE is unnecessary.
Upvotes: 0