Lobo
Lobo

Reputation: 566

Problems with LIKE command in SQL Server

I'm trying to execute a query containing LIKE in a SQL Server 2008 database, but for some reason the query takes forever and times out.

The table contains around 47 million rows with aggregated log data, and I'm trying to find a logentry for a specific machine containing a specific application name. My query looks like this:

SELECT MgmtLogID,MgmtLogSeverity, MgmtLogSource, CAST(MgmtLogText as TEXT) as MgmtLogText,   MgmtLogTime, MgmtLogHost 
FROM [dbo].[MgmtLog] 
--Fixed values
WHERE MgmtLogOrigin = 'EventLog' AND MgmtLogSeverity <= 3 
--Values depending on what I'm searching for
AND MgmtLogHost = 'MY MACHINENAME'AND MgmtLogTime > 'MY START TIME' AND MgmtLogText LIKE '% KEYWORD TO SEARCH FOR %' 
ORDER BY MgmtLogTime DESC

The query executes in around 1-2 seconds without the LIKE and returns around 10 rows. With the LIKE it should return 2 rows out of those ten so it shouldn't be that taxing but it times out. I'm guessing that it has something to do with the properties of MgmLogText, but I'm not sure what. It is an ntext field that has length 16 and uses Finnish_Swedish_CI_AS collation.

In the end I need to execute the query from a php script since I need to find log records for a arbitrary number of machines and/or applications

Upvotes: 0

Views: 544

Answers (2)

Daniel
Daniel

Reputation: 1151

Depending on the field type of MgmtLogText, the indexes won't be used. Also, as mentioned by other commenters, the LIKE also prevents the index from being used.

Off the top of my head, I wonder if it would work if you use a subquery. The inside query should be the one without the LIKE which only returns 10 results. Then the outside query should be the one that uses the LIKE. That way the LIKE only has to search 10 rows instead of 47 million.

There's probably a more efficient way, but I was thinking something like this:

SELECT MgmtLogID,MgmtLogSeverity, MgmtLogSource, CAST(MgmtLogText as TEXT) as MgmtLogText,   MgmtLogTime, MgmtLogHost 
FROM [dbo].[MgmtLog] 
WHERE MgmtLogID IN (
  SELECT MgmtLogID
  FROM [dbo.MgmtLog]
  WHERE MgmtLogOrigin = 'EventLog' AND MgmtLogSeverity <= 3 
  AND MgmtLogHost = 'MY MACHINENAME'AND MgmtLogTime > 'MY START TIME'
  ORDER BY MgmtLogTime DESC
)
AND MgmtLogText LIKE '%some value%'

Upvotes: 2

paxdiablo
paxdiablo

Reputation: 882606

Query clauses like where colname like '%something%' are not able to take advantage of indexes and usually result in a full scan of the possible rows to ascertain which ones should be delivered

Although, as ChrisC points out in a comment, it's somewhat surprising that the more efficient clauses aren't firt used to reduce the candidate rowset down to a manageable size before trying to use like - perhaps the statistics for the table are not up to date enough for the query analysis to decide this - best run whatever counts for an explain query under SQL Server.

The reason your non-like query is so fast is because it almost certainly has an index on MgmtLogHost and/or MgmtLogTime which can be used to quickly cull unneeded rows.

One way you can fix this is to use something like insert/update triggers to process the MgmtLogText data only when changed, to extract the application names out and put them in a separate table which can be far better optimised.

Even just using such a trigger to keep a lowercased version of the column (in another column) would be an improvement. Using a case-insensitive collation means that selects run slower since they have to allow for XYZZY and xyzzy being classed as equal. If instead you maintain a lower-cased version in the table and ensure the check is done against lower case, that effort disappears as you only have one case to worry about.

And, by doing all this in the trigger, you ensure that it's only done when necessary (when the data is changed), not every time you want to select. This amortises the cost over many selects.

You can also use something like full text indexing if your DBMS supports it but I've often thought that was like trying to kill mosquitos with a thermo-nuclear warhead.

Yes, there are situations where you may need full text indexing but, in the vast majority of cases, you can gain efficiency by being a little more selective.

Upvotes: 1

Related Questions