Reputation: 7778
I need to check if a partial name matches full name. For example:
Partial_Name | Full_Name
--------------------------------------
John,Smith | Smith William John
Eglid,Timothy | Timothy M Eglid
I have no clue how to approach this type of matching.
Another thing is that name and last name may come in the wrong order, making it harder.
I could do something like this, but this only works if names are in the same order and 100% match
decode(LOWER(REGEXP_REPLACE(Partial_Name,'[^a-zA-Z'']','')), LOWER(REGEXP_REPLACE(Full_Name,'[^a-zA-Z'']','')), 'Same', 'Different')
Upvotes: 1
Views: 174
Reputation: 7778
This is what I ended up doing... Not sure if this is the best approach. I split partials by comma and check if first name present in full name and last name present in full name. If both are present then match.
CASE
WHEN
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 1)))) > 0
AND
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 2)))) > 0
THEN 'Y'
ELSE 'N'
END AS MATCHING_NAMES
Upvotes: 0
Reputation: 8797
WITH
/*
tab AS
(
SELECT 'Smith William John' Full_Name, 'John,Smith' Partial_Name FROM dual
UNION ALL SELECT 'Timothy M Eglid', 'Eglid,timothy' FROM dual
UNION ALL SELECT 'Tim M Egli', 'Egli,Tim,M2' FROM dual
UNION ALL SELECT 'Timot M Eg', 'Eg' FROM dual
),
*/
tmp AS (
SELECT Full_Name, Partial_Name,
trim(CASE WHEN instr(Partial_Name, ',') = 0 THEN Partial_Name
ELSE regexp_substr(Partial_Name, '[^,]+', 1, lvl+1)
END) token
FROM tab t CROSS JOIN (SELECT lvl FROM (SELECT LEVEL-1 lvl FROM dual
CONNECT BY LEVEL <= (SELECT MAX(LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')))+1 FROM tab)))
WHERE LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')) >= lvl
)
SELECT Full_Name, Partial_Name
FROM tmp
GROUP BY Full_Name, Partial_Name
HAVING count(DISTINCT token)
= count(DISTINCT CASE WHEN REGEXP_LIKE(Full_Name, token, 'i')
THEN token ELSE NULL END);
In the tmp
each partial_name
is splitted on tokens (separated by comma)
The resulting query retrieves only those rows which full_name
matches all the corresponding tokens.
This query works with the dynamic number of commas in partial_name
. If there can be only zero or one commas then the query will be much easier:
SELECT * FROM tab
WHERE instr(Partial_Name, ',') > 0
AND REGEXP_LIKE(full_name, substr(Partial_Name, 1, instr(Partial_Name, ',')-1), 'ix')
AND REGEXP_LIKE(full_name, substr(Partial_Name,instr(Partial_Name, ',')+1), 'ix')
OR instr(Partial_Name, ',') = 0
AND REGEXP_LIKE(full_name, Partial_Name, 'ix');
Upvotes: 1
Reputation: 7948
you could use this pattern on the text provided - works for most engines
([^ ,]+),([^ ,]+)(?=.*\b\1\b)(?=.*\b\2\b)
Upvotes: 2