MAK
MAK

Reputation: 7260

Performance tuning on PATINDEX with JOIN

I have table called as tbl_WHO with 90 millions of records and temp table #EDU with just 5 records.

I want to do pattern matching on name field between two tables (tbl_WHO and #EDU).

Query: Following query took 00:02:13 time for execution.

SELECT  Tbl.PName,Tbl.PStatus     
FROM tbl_WHO Tbl 
INNER JOIN #EDU Tmp 
ON 
(
    (ISNULL(PATINDEX(Tbl.PName,Tmp.FirstName),'0')) > 0 
)    

Sometimes I have to do pattern matching on more than one columns like:

SELECT  Tbl.PName,Tbl.PStatus     
FROM tbl_WHO Tbl 
INNER JOIN #EDU Tmp 
ON 
(
    (ISNULL(PATINDEX(Tbl.PName,Tmp.FirstName),'0')) > 0 AND
    (ISNULL(PATINDEX('%'+Tbl.PAddress+'%',Tmp.Addres),'0')) > 0 OR
    (ISNULL(PATINDEX('%'+Tbl.PZipCode,Tmp.ZCode),'0')) > 0  
)    

Note: There is INDEX created on the columns which comes under condition.

Is there any other way to tune the query performance?

Upvotes: 1

Views: 338

Answers (1)

gotqn
gotqn

Reputation: 43646

Searches starting with % are not sargable, so even having index on the given column, you are not going to be able to use it effectively.

Are you sure you need to search with PATINDEX each time? Table with 90 millions records is not huge, but having many columns and not applying normalization correctly can decrease the performance for sure.

I will advice to revise the table and check if the data can be normalized further. This can lead to better performance in particular cases and decreasing the table storage as well.

For example, the zipcode can be move to separated table and instead using the zipcode string, you can join by integer column. Try to normalized the address further - if you have city, street or block, street or block number? The names - if you need to search by first, last names just split the names in separate columns.

For string values, the data can be sanitized - remove empty strings at the beg and at the end (trim) for example. And having such data, we can create hash indexes and get extremely fast equal searches.

What I want to say is that if you normalized your data and add some rules (on database and application level) to ensure the input data is correct you are going to have very nice performance. And it is the long way, but you are going to do this - it's easier to be done now, than later (you are late and now).

Upvotes: 1

Related Questions