Reputation: 23

retrieving similar data in oracle

Lets say I have the following table for example:

id  name
---------
1   Matt
2   Ryan
3   Joseph
4   Matt1
5   5Joseph
6   David
7   Matt_43

We see that Matt and Joseph has been repeated more than once i.e Matt, Matt1 and Matt_43. Similarly Joseph is repeating twice.

Is there a way to retrieve such kind of data?

Upvotes: 0

Answers (5)

William Robertson

Reputation: 16001

Depending on how you are defining 'similar', you could look at soundex() or utl_match:

with demo (id, name) as
    ( select 1, 'Matt' from dual union all
      select 2, 'Ryan' from dual union all
      select 3, 'Joseph' from dual union all
      select 4, 'Matt1' from dual union all
      select 5, '5Joseph' from dual union all
      select 6, 'David' from dual union all
      select 7, 'Matt_43' from dual )
select id, name
     , soundex(name)
     , count(*) over(partition by soundex(name)) as count_similar
     , utl_match.jaro_winkler_similarity(name, 'Matt') as similarity_to_matt
from   demo
order by similarity_to_matt desc;

     ID NAME      SOUNDEX(NAME) COUNT_SIMILAR SIMILARITY_TO_MATT
------- --------- ------------- ------------- ------------------
      1 Matt      M300                      3                100
      4 Matt1     M300                      3                 96
      7 Matt_43   M300                      3                 91
      2 Ryan      R500                      1                 50
      6 David     D130                      1                 48
      5 5Joseph   J210                      2                  0
      3 Joseph    J210                      2                  0

Upvotes: 1

zarruq

Reputation: 2465

Assuming that you need to retrieve data having similar text in column name, you can join the table to itself, use like in join condition and return distinct records as below.

select distinct t1.id as id, 
       t1.name as name
from table1 t1
join table1 t2
  on ((t1.name like '%'|| t2.name ||'%' or t2.name like '%'|| t1.name ||'%') 
      and t1.id <> t2.id);

Result:

+----+---------+
| ID |  NAME   |
+----+---------+
|  4 | Matt1   |
|  7 | Matt_43 |
|  1 | Matt    |
|  3 | Joseph  |
|  5 | 5Joseph |
+----+---------+

DEMO

Update:

If you don't want the result to be case-sensitive, use upper

select distinct t1.id as id, 
       t1.name as name
from table1 t1
join table1 t2
  on ((upper(t1.name) like '%'|| upper(t2.name) ||'%' or upper(t2.name) like '%'|| upper(t1.name) ||'%') 
      and t1.id <> t2.id)

Upvotes: 1

LukStorms

Reputation: 29667

You could self-join your table with a LIKE

For example:

select t1.id as id1, t1.name as name1, t2.id as id2, t2.name as name2
from your_table t1
join your_table t2
  on (upper(t1.name) like '%'|| upper(t2.name) ||'%' and t1.id <> t2.id)

Upvotes: 2

ARr0w

Reputation: 1731

since oracle is very case sensitive even when 2 words are same where one starts with capital letter and second with small,

oracle will consider them two different data, to keep them in your search you would require this approach: select name from emp where UPPER(name) like '%UPPER(Ryan)%';

Upvotes: 1

user8718405

Reputation:

you can use like wildcard characters that have operator like '%matt%' this will return all the matt value even if the prefix or postfix would not match

here is the query

select name from emp where name like '%matt%' or name like '%joseph%'

Upvotes: 1

retrieving similar data in oracle

Answers (5)

Related Questions