Andrew
Andrew

Reputation: 7778

Check if string variations exists in another string

I need to check if a partial name matches full name. For example:

Partial_Name   |  Full_Name
--------------------------------------
John,Smith     |  Smith William John
Eglid,Timothy  |  Timothy M Eglid

I have no clue how to approach this type of matching.

Another thing is that name and last name may come in the wrong order, making it harder.

I could do something like this, but this only works if names are in the same order and 100% match

decode(LOWER(REGEXP_REPLACE(Partial_Name,'[^a-zA-Z'']','')), LOWER(REGEXP_REPLACE(Full_Name,'[^a-zA-Z'']','')), 'Same', 'Different')

Upvotes: 1

Views: 174

Answers (3)

Andrew
Andrew

Reputation: 7778

This is what I ended up doing... Not sure if this is the best approach. I split partials by comma and check if first name present in full name and last name present in full name. If both are present then match.

CASE 
    WHEN 
     instr(trim(lower(Full_Name)), 
     trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 1)))) > 0
       AND 
     instr(trim(lower(Full_Name)), 
     trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 2)))) > 0
    THEN 'Y'
    ELSE 'N'
END AS MATCHING_NAMES

Upvotes: 0

Multisync
Multisync

Reputation: 8797

WITH 
/*
tab AS
(
SELECT 'Smith William John' Full_Name, 'John,Smith' Partial_Name FROM dual
UNION ALL SELECT 'Timothy M Eglid', 'Eglid,timothy' FROM dual
UNION ALL SELECT 'Tim M Egli', 'Egli,Tim,M2' FROM dual
UNION ALL SELECT 'Timot M Eg', 'Eg' FROM dual
),
*/
tmp AS (
  SELECT Full_Name, Partial_Name,
         trim(CASE WHEN instr(Partial_Name, ',') = 0 THEN Partial_Name         
                   ELSE regexp_substr(Partial_Name, '[^,]+', 1, lvl+1)
              END) token
  FROM tab t CROSS JOIN (SELECT lvl FROM (SELECT LEVEL-1 lvl FROM dual 
                         CONNECT BY LEVEL <= (SELECT MAX(LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')))+1 FROM tab)))
  WHERE LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')) >= lvl
) 
SELECT Full_Name, Partial_Name
FROM tmp
GROUP BY Full_Name, Partial_Name
HAVING count(DISTINCT token) 
       = count(DISTINCT CASE WHEN REGEXP_LIKE(Full_Name, token, 'i') 
                             THEN token ELSE NULL END);                             

In the tmp each partial_name is splitted on tokens (separated by comma)

The resulting query retrieves only those rows which full_name matches all the corresponding tokens.

This query works with the dynamic number of commas in partial_name. If there can be only zero or one commas then the query will be much easier:

SELECT * FROM tab 
WHERE instr(Partial_Name, ',') > 0 
         AND REGEXP_LIKE(full_name, substr(Partial_Name, 1, instr(Partial_Name, ',')-1), 'ix')
         AND REGEXP_LIKE(full_name, substr(Partial_Name,instr(Partial_Name, ',')+1), 'ix')
      OR instr(Partial_Name, ',') = 0 
         AND REGEXP_LIKE(full_name, Partial_Name, 'ix');

Upvotes: 1

alpha bravo
alpha bravo

Reputation: 7948

you could use this pattern on the text provided - works for most engines

([^ ,]+),([^ ,]+)(?=.*\b\1\b)(?=.*\b\2\b)  

Demo

Upvotes: 2

Related Questions