zain ul abidin
zain ul abidin

Reputation: 195

Specific string matching

I am working in SQL Server 2012. In my table, there is a column called St_Num and its data is like this:

St_Num                 status
------------------------------
128 TIMBER RUN DR      EXP
128 TIMBER RUN DRIVE   EXP

Now we can notice that there are spelling variations in the data above. What I would like to do is that if the number in this case 128 and first 3 letters in St_Num column are same then these both rows should be considered the same like this the output should be:

St_Num                status
-----------------------------
128 TIMBER RUN DR     EXP

I did some search regarding this and found that left or substring function can be handy here but I have no idea how they will be used here to get what I need and don't know even if they can solve my issue. Any help regarding how to get the desired output would be great.

Upvotes: 0

Views: 94

Answers (5)

Emil
Emil

Reputation: 336

I still have odd feeling that your criteria is not enough to match same addresses but this might help, since it considers also length of the number:

WITH ParsedAddresses(st_num, exp, number)
AS
(
    SELECT st_num,
    exp,
    number = ROW_NUMBER() OVER(PARTITION BY LEFT(st_num, CHARINDEX(' ', st_num) + 3) ORDER BY LEN(st_num))
    FROM <table_name>
)
SELECT st_num, exp FROM ParsedAddresses
WHERE number = 1

Upvotes: 0

forpas
forpas

Reputation: 164194

This will output only the first of the matching rows:

with cte as (
  select *,
  row_number() over (order by (select null)) rn
  from tablename
)
select St_Num, status from cte t
where not exists (
  select 1 from cte
  where 
    left(St_Num, 7) = left(t.St_Num, 7)
    and
    rn < t.rn
)

See the demo

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271003

I don't really approve of your matching logic . . . but that is not your question. The big issue is how long is the number before the string. So, you can get the shortest of the addresses using:

select distinct t.*
from t
where not exists (select 1
                  from t t2
                  where left(t2.st_num, patindex('%[a-zA-Z]%') + 2, t.st_num) = left(t.st_num, patindex('%[a-zA-Z]%', t.st_num) + 2) and
                        len(t.St_Num) < len(t2.St_Num)
                 );

Upvotes: 1

Barbaros &#214;zhan
Barbaros &#214;zhan

Reputation: 65408

You can use grouping by status and substring(St_Num,1,3)

with t(St_Num, status) as
(
 select '128 TIMBER RUN DR'   ,'EXP' union all
 select '128 TIMBER RUN DRIVE','EXP'   
) 
select min(St_Num) as St_Num, status
  from t    
 group by status, substring(St_Num,1,3);

St_Num              status
-----------------   ------
128 TIMBER RUN DR   EXP

Upvotes: 1

GeekKat
GeekKat

Reputation: 24

This could possibly be done by using a subquery in the same way that you would eliminate duplicates in a table so:

SELECT Str_Num, status
FROM <your_table> a
WHERE NOT EXISTS (SELECT 1
FROM <your_table> b
WHERE SUBSTRING(b.Str_Num, 1, 7) = SUBSTRING(a.Str_Num, 1, 7));

This would only work however if the number is guaranteed to be 3 characters long, or if you don't mind it taking more characters in the case that the number is fewer characters.

Upvotes: 1

Related Questions