Reputation: 195
I am working in SQL Server 2012. In my table, there is a column called St_Num
and its data is like this:
St_Num status
------------------------------
128 TIMBER RUN DR EXP
128 TIMBER RUN DRIVE EXP
Now we can notice that there are spelling variations in the data above. What I would like to do is that if the number in this case 128 and first 3 letters in St_Num
column are same then these both rows should be considered the same like this the output should be:
St_Num status
-----------------------------
128 TIMBER RUN DR EXP
I did some search regarding this and found that left or substring function can be handy here but I have no idea how they will be used here to get what I need and don't know even if they can solve my issue. Any help regarding how to get the desired output would be great.
Upvotes: 0
Views: 94
Reputation: 336
I still have odd feeling that your criteria is not enough to match same addresses but this might help, since it considers also length of the number:
WITH ParsedAddresses(st_num, exp, number)
AS
(
SELECT st_num,
exp,
number = ROW_NUMBER() OVER(PARTITION BY LEFT(st_num, CHARINDEX(' ', st_num) + 3) ORDER BY LEN(st_num))
FROM <table_name>
)
SELECT st_num, exp FROM ParsedAddresses
WHERE number = 1
Upvotes: 0
Reputation: 164194
This will output only the first of the matching rows:
with cte as (
select *,
row_number() over (order by (select null)) rn
from tablename
)
select St_Num, status from cte t
where not exists (
select 1 from cte
where
left(St_Num, 7) = left(t.St_Num, 7)
and
rn < t.rn
)
See the demo
Upvotes: 1
Reputation: 1271003
I don't really approve of your matching logic . . . but that is not your question. The big issue is how long is the number before the string. So, you can get the shortest of the addresses using:
select distinct t.*
from t
where not exists (select 1
from t t2
where left(t2.st_num, patindex('%[a-zA-Z]%') + 2, t.st_num) = left(t.st_num, patindex('%[a-zA-Z]%', t.st_num) + 2) and
len(t.St_Num) < len(t2.St_Num)
);
Upvotes: 1
Reputation: 65408
You can use grouping by status
and substring(St_Num,1,3)
with t(St_Num, status) as
(
select '128 TIMBER RUN DR' ,'EXP' union all
select '128 TIMBER RUN DRIVE','EXP'
)
select min(St_Num) as St_Num, status
from t
group by status, substring(St_Num,1,3);
St_Num status
----------------- ------
128 TIMBER RUN DR EXP
Upvotes: 1
Reputation: 24
This could possibly be done by using a subquery in the same way that you would eliminate duplicates in a table so:
SELECT Str_Num, status
FROM <your_table> a
WHERE NOT EXISTS (SELECT 1
FROM <your_table> b
WHERE SUBSTRING(b.Str_Num, 1, 7) = SUBSTRING(a.Str_Num, 1, 7));
This would only work however if the number is guaranteed to be 3 characters long, or if you don't mind it taking more characters in the case that the number is fewer characters.
Upvotes: 1