daniness
daniness

Reputation: 383

Joining on numeric part of string

It's been a while...I'd like to get your advice on the most efficient way to join on only the number part of a field that may be prefixed and/or suffixed with up to 2 letters. Here's a simplified snippet of what I'm trying to do:

 SELECT a, b, c
 FROM table 1 t1
 LEFT JOIN table 2 t2 ON t1.PolicyCode = t2.sPolicyID,

Where t2.sPolicyID could begin and/or end with up to 2 letters. Some examples: TG73100, S7286674, 2344506R, etc. We only want to join to just its numeric part in between the letters, i.e. 73100, 7286674 or 2344506 from the examples.

Could someone please advise on a simple way of doing this?

Upvotes: 1

Views: 1144

Answers (1)

Aaron Dietz
Aaron Dietz

Reputation: 10277

Here is one way:

LEFT JOIN table 2 t2 ON t1.PolicyCode =

LEFT(SUBSTRING(t2.sPolicyID, PATINDEX('%[0-9]%', t2.sPolicyID), 50),
     PATINDEX('%[^0-9]%', 
     SUBSTRING(t2.sPolicyID, PATINDEX('%[0-9]%', t2.sPolicyID), 50)
     + 'a') -1)

To break this down, there are 4 main parts.

1: Find the position of the first number with PATINDEX:

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

SELECT PATINDEX('%[0-9]%', @spolicyID) 

--Returns 3

2: Use SUBSTRING() to cut off everything before the first letter:

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

SELECT SUBSTRING(@spolicyID, PATINDEX('%[0-9]%', @spolicyID), 50) 

--Returns 123123xx

If we hardcoded the 3 that we know is returned from the first part, it would look like this:

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

SELECT SUBSTRING(@spolicyID, 3), 50) 

--50 is the number of characters to extract, set to something 
--higher than the max string length to be safe

Of course, we don't want to hardcode it since it can change, but that makes seeing the different functions a bit easier.

3: Find the position of the next letter using PATINDEX again:

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

SELECT PATINDEX('%[^0-9]%', SUBSTRING(@spolicyID, PATINDEX('%[0-9]%', @spolicyID), 50) + 'a')

--Returns 7 since it is looking at 123123xx
--The first x is in the 7th position

Note that we added an a onto the string. This is because if we had a string with no letters at the end, it would throw an error as the length 0 would be returned to SUBSTRING. You could add any letter or letters to the end and it would work, we are just making sure there is at least one. Try removing the + 'a' and using a string like xx123123 to see the error.

If we hardcoded the 123123xx from step 2 it would look like this (again just for easy example):

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

SELECT PATINDEX('%[^0-9]%', '123123xx' + 'a')

4: Use LEFT() to return everything before the trailing letters, leaving us with only the numbers in between:

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

LEFT(SUBSTRING(@spolicyID, PATINDEX('%[0-9]%', @spolicyID), 50),PATINDEX('%[^0-9]%', SUBSTRING(@spolicyID, PATINDEX('%[0-9]%', @spolicyID), 50) + 'a') -1)

--Need to add `-1` because step 3 PATINDEX returns 7
--as the position of first trailing letter, and
--we want the 6 characters before that

And again hardcoded from step 2 and 3 for easy viewing:

DECLARE @spolicyID VARCHAR(20) = 'xx123123xx'

LEFT('123123xx', 7-1)

Upvotes: 3

Related Questions