Reputation: 7260
I have the following table:
Table:
CREATE TABLE str_matching
(
colstr varchar(200)
);
Insert data:
INSERT INTO str_matching VALUES('5sXYZA1010B')
INSERT INTO str_matching VALUES('A1010B')
INSERT INTO str_matching VALUES('AMZ103B15K')
INSERT INTO str_matching VALUES('B15K')
INSERT INTO str_matching VALUES('XC101')
INSERT INTO str_matching VALUES('C101')
INSERT INTO str_matching VALUES('502KMD1FZ10009L')
INSERT INTO str_matching VALUES('FZ10009L')
INSERT INTO str_matching VALUES('A9L')
INSERT INTO str_matching VALUES('XZ049L')
INSERT INTO str_matching VALUES('LM101')
Expected Output: I want to display only those records that has duplicate entries, if one string match last part of any string then I am considering as duplicate.
For example: I have two strings
5sXYZA1010B
A1010B
2nd string which is matching at end of 1st string, so want to display such records.
Note: Length of string's are not fixed, it can be match at any point.
Expected Result:
colstr
--------------------
5sXYZA1010B
A1010B
AMZ103B15K
B15K
XC101
C101
502KMD1FZ10009L
FZ10009L
Upvotes: 1
Views: 157
Reputation: 453288
This should do it (demo)
SELECT DISTINCT CA.colstr
FROM str_matching s1
JOIN str_matching s2
ON s1.colstr <> s2.colstr
AND s2.colstr LIKE '%' + s1.colstr
CROSS APPLY (VALUES(s1.colstr),
(s2.colstr)) CA(colstr)
However if str_matching
has many rows performance will be poor. Adding an index on the reverse of the string can substantially improve things - as in the example below.
CREATE TABLE str_matching
(
colstr varchar(200),
colstr_rev AS REVERSE(colstr)
);
CREATE INDEX ix_colstr_rev on str_matching(colstr_rev)
SELECT colstr = REVERSE(CA.colstr_rev)
FROM str_matching s1
JOIN str_matching s2
ON s1.colstr_rev <> s2.colstr_rev
AND s2.colstr_rev LIKE s1.colstr_rev + '%'
CROSS APPLY (VALUES(s1.colstr_rev),
(s2.colstr_rev)) CA(colstr_rev)
GROUP BY CA.colstr_rev
Upvotes: 2
Reputation: 38023
it can be match at any point.
To match any point, use like
with wildcards (%
) on each side:
For multiple matches, add distinct
select l.colstr
from str_matching l
inner join str_matching r
on l.colstr<>r.colstr
and (l.colstr like '%' + r.colstr +'%'
or r.colstr like '%' + l.colstr +'%'
)
rextester demo: http://rextester.com/ICIKJ2256
returns:
+-----------------+
| colstr |
+-----------------+
| A1010B |
| 5sXYZA1010B |
| B15K |
| AMZ103B15K |
| C101 |
| XC101 |
| FZ10009L |
| 502KMD1FZ10009L |
+-----------------+
Given Martin Smith's answer, I'm guessing I misunderstood your intent behind 'match at any point', so to just match the end of the string you would not use the trailing + '%'
Upvotes: 1
Reputation: 13393
You can use it.
;WITH CTE AS (
select *,RN = ROW_NUMBER() OVER (ORDER BY LEN(colstr)) from str_matching
)
,CTE2 AS (
SELECT T1.colstr colstr1 ,X.colstr colstr2 FROM CTE T1
CROSS APPLY (SELECT * FROM CTE T2 WHERE T2.RN > T1.RN AND RIGHT(T2.colstr, LEN(T1.colstr)) = T1.colstr) AS X
)
SELECT colstr1 FROM CTE2
UNION ALL
SELECT colstr2 FROM CTE2
Result:
5sXYZA1010B
A1010B
AMZ103B15K
B15K
XC101
C101
502KMD1FZ10009L
FZ10009L
Upvotes: 1