Reputation: 367
Suppose I have a table of strings, like this:
VAL
-----------------
Content of values
Values identity
Triple combo
my combo
sub-zero combo
I want to find strings which have equal words. The result set should be like
VAL MATCHING_VAL
------------------ ------------------
Content of values Values identity
Triple combo My combo
Triple combo sub-zero combo
or at least something like this. Can you help?
Upvotes: 5
Views: 199
Reputation: 1269513
One method is to use a hack for regular expressions:
select t1.val, t2.val
from t t1 join
t t2
on regexp_like(t1.val, replace(t2.val, ' ', '|');
You might want the case to be identical as well:
on regexp_like(lower(t1.val), replace(lower(t2.val), ' ', '|');
Upvotes: 7
Reputation: 35323
Using some of the [oracle internal similiarity] found in UTL_Match (https://docs.oracle.com/database/121/ARPLS/u_match.htm#ARPLS71219) matching...
This logic is more for matching names or descriptions that are 'Similar' and where phonetic spellings or typo's may cause the records not to match.
By adjusting the .5 below you can see how the %'s get you closer and closer to perfect matches.
with cte as (
select 'Content of values' val from dual union all
select 'Values identity' val from dual union all
select 'triple combo' from dual union all
select 'my combo'from dual union all
select 'sub-zero combo'from dual)
select a.*, b.*, utl_match.edit_distance_similarity(a.val, b.val) c, UTL_MATCH.JARO_WINKLER(a.val,b.val) JW
from cte a
cross join cte b
where UTL_MATCH.JARO_WINKLER(a.val,b.val) > .5
order by utl_match.edit_distance_similarity(a.val, b.val) desc
and screenshot of query/output.
Or we could use an inner join and > if we only want one way compairisons...
select a.*, b.*, utl_match.edit_distance_similarity(a.val, b.val) c, UTL_MATCH.JARO_WINKLER(a.val,b.val) JW
from cte a
inner join cte b
on A.Val > B.Val
where utl_match.jaro_winkler(a.val,b.val) > .5
order by utl_match.edit_distance_similarity(a.val, b.val) desc
this returns the 3 desired records.
But this does not explicitly check each any word matches.
which was your base requirement. I just wanted you to be aware of alternatives.
Upvotes: 1
Reputation: 420
You could use a combination of SUBSTRING and LIKE.
use charIndex(" ") to split the words up in the substring if thats what you want to do.
Upvotes: 1