Andrew
Andrew

Reputation: 5083

Character-matching queries in SQL

I'm attempting to optimize a T-SQL stored procedure I have. It's for pulling records based on a VIN (a 17-character alphanumeric string); usually people only know a few of the digits—e.g. the first digit could be '1', '2', or 'J'; the second is 'H' but the third could be 'M' or 'G'; and so on.

This leads to a pretty convoluted query whose WHERE clause is something like

WHERE SUBSTRING(VIN,1,1) IN ('J','1','2')
AND SUBSTRING(VIN,2,1) IN ('H')
AND SUBSTRING(VIN,3,1) IN ('M','G')
AND SUBSTRING(VIN,4,1) IN ('E')
AND ... -- and so on for however many digits we need to search on

The table I'm querying on is huge (millions of records) so the queries I'm running that have this kind of WHERE clause can take hours to run if there are more than a couple digits being searched on, even if I'm only requesting the top 3000 records. I feel like there has to be a way to get this substring character matching to run faster. Hours are completely unacceptable; I'd like to have these kinds of queries run in just a few minutes.

I don't have any editing privileges on the database, sadly, so I can't add indexes or anything like that; all I can do is change my stored procedure (although I can try to beg the DBAs to modify the table).

Upvotes: 3

Views: 1040

Answers (3)

Martin Smith
Martin Smith

Reputation: 453243

You can use

WHERE VIN LIKE '[J12]H[MG]E%'

At least that should hopefully lead to 3 index seeks on the ranges JH%, 1H%, and 2H% rather than a full scan.

Edit Although testing locally I found that it does not do multiple index seeks as I had hoped it converts the above to a single seek on the larger range VIN >= '1' and VIN < 'K' with a residual predicate to evaluate the LIKE

I'm not sure whether it will do this for larger tables or not but otherwise it may well be worth trying to encourage this plan with

WHERE (VIN LIKE 'JH%' OR  VIN LIKE '1H%' OR  VIN LIKE '2H%') 
        AND VIN LIKE '[J12]H[MG]E%'

Upvotes: 3

AllenG
AllenG

Reputation: 8190

I like the LIKE answers, but here's another alternative (especially if your input isn't always the same).

I would do this as a series of queries on ever-smaller temp tables (Yes, I'm in love with temp tables- sue me.)

So I would do something like

SELECT [Fields]
INTO #tempResultsFirstTwoDigits
FROM VIN
WHERE [Clause]

Then keep moving down the chain digit by digit until you've searched each of the provided characters. So you might do this:

if len(@input) > 2
SELECT [Fields]
INTO #tempResultsThreeDigits
FROM VIN
WHERE Substring(VIN, 3, 1) = Substring(@input, 3, 1)
//NOTE: That where clause might be sped up by initializing a variable at 
//      the beginning of the SP for each character you got.

Else Select * From #tempResultsFirstTwoDigits
GOTO Stop //Where "Stop" just defines the end of the SP to skip any further checks

Again, LIKE might be a better answer for you, but I would try both approaches and benchmark both of them.

Upvotes: 1

John Hartsock
John Hartsock

Reputation: 86872

You could use the LIKE keyword

SELECT
  *
FROM Table
WHERE VIN LIKE '[J12]H[MG]E%'

This would even allow you to work with instance where they know the second character is not 'A' by using [^A] in the statement, such as:

WHERE VIN LIKE '[J12][^A][MG]E%'

Reference http://msdn.microsoft.com/en-us/library/ms179859.aspx

Upvotes: 2

Related Questions