Kron
Kron

Reputation: 483

SQL Server More Efficient Substring

I am running a query in SQL Server where I need to join two tables into one where the full name field matches the partial name field in another after apostrophes have been removed. For a code example the join is happening like this:

from [Data1]
right join [Data2]
    on replace([Data2].[PartialName], '''','')=Substring([Data1].[FullName],1,1+LEN(replace([Data2].[PartialName], '''','')))

And it works. But it takes what would be a 10 second execution if we just used where name=name and makes it take around 20 minutes. This is rather unacceptable in terms of run time so I was wondering if anyone had any more efficient alternatives to consider.

Btw Data 1 has about 800 lines and Data2 has about 1.6 million if it's relevant.

Edit: I've been told I need to give a bit more descriptive information. Basically in this example Data1 is a table from an outside source that contains a name field [FullName] which contains people's full names in the form of 'Last-Name , First-Name Middle-Name(s)' with any apostrophes removed (for example in the name O'Neil it would just be ONeil). So an example would be 'ONeil , Sarah Conner'

Data2 contains a name field that has names in the form 'Last-Name , First-Name' Middle names are omitted and apostrophes are intact. So for example 'O'Neil , Sarah'

These tables need to be merged together on their name fields, hence the logic above.

Upvotes: 1

Views: 259

Answers (1)

Thom A
Thom A

Reputation: 95557

DavidG is right, a PERSISTED column is the way to go here. After drinking a little more coffee, I think you need a computed column and then LIKE in your JOIN. The PERSISTED column's SQL would be something like:

ALTER TABLE [Data2] ADD PartialName_na AS REPLACE(PartialName,'''','') PERSISTED;

You may want to add that to an index. Then your new (pseudo) SQL query would be:

SELECT ...
FROM Data2 D2
     LEFT JOIN Data1 D1 ON D1.FullName = D2.PartialName_na + '%';

There's no need to use SUBSTRING. LIKE will maintain SARGability here, it doesn't use a leading wildcard.

Edit: Couple of notes. I used the _na suffix to stand for "No Apostrophe"; you can call the column whatever you want. I also changed the query from a RIGHT JOIN to a LEFT JOIN. Personally I feel that LEFT JOINs are much easier to read, however, if you want to swap it back round, feel free.

Upvotes: 3

Related Questions