Mr. Din
Mr. Din

Reputation: 5

Oracle Query to parse Suffix from Name

I have fullname (lastname, firstname middlename including suffix) in one field. Here is the Business Rule to Parse the name

. Last Name: Characters from first position up to the comma would go in the last name. . First Name: The 1st charactersafter the comma to the next space would go to the first name. . Suffix: Once First and Last Names are derived, search for Suffix name in the remaining part of the name phrase based on values (Jr, Sr, II, III, IV, V) . Middle Name: Put rest of the characters into the middle name.

  1. LST, FRST MDL SR (Her LST is Last Name, FRST is First Name, MDL is middle Name, SR suffix)
  2. LST, FRST SR MDL MDL2 (Her LST is LastName, FRST is FirstName, "MDL MDL2" is middleName, SR suffix)
    1. LST, FRST MDL1 JR MDL2 MDL3 (Her LST is LastName, FRST is FirstName, "MDL1 MDL2 MDL3" is middleName, JR suffix)

Meaning Suffix and Middle name in various formats. Can a query be written to extract the suffix and put rest into the middle name field?

Here is my query but I am looking for more easy way and also my query is checking upto 3 spaces in middle name to find suffix.

WITH IDN_NAM AS
(
 SELECT RECORD_NUMBER, IDN.INDEX1,
  trim(regexp_substr(REPLACE(IDN.IDN_NAM,',',' '),  '[^ ]+',1,2)) AS FIRST_NAME,
  trim(regexp_substr(REPLACE(IDN.IDN_NAM,',',' '),  '[^ ]+',1,1)) AS LAST_NAME,
  CASE WHEN instr(REPLACE(IDN.IDN_NAM,',',' '),' ',1,2)+1 = 1 THEN NULL ELSE substr(REPLACE(IDN.IDN_NAM,',',' '),instr(REPLACE(IDN.IDN_NAM,',',' '),' ',1,2)+1) END AS MID_NAME,
  TRIM(IDN.IDN_NAM) AS FULL_NAME
  FROM 
  (
   SELECT IDN.RECORD_NUMBER, IDN_NAM IDN_NAM_ORIG, replace(replace(replace(REPLACE(IDN.IDN_NAM, ',', ' '),' ','<>'),'><',''),'<>',' ') IDN_NAM , IDN.INDEX1
   FROM SNM_TMP_IDENTITY_IDN_NAM IDN,
        TMP_LEGACY_EVENT T
   WHERE IDN.RECORD_NUMBER = T.RECORD_NUMBER 
    AND NOT REGEXP_LIKE(IDN_NAM,'DLE.+[[:digit:]]')
  ) IDN
)
SELECT DECODE(INDEX1, 1, 'T', 'F') MASTER_IND, FIRST_NAME, LAST_NAME,--, MID_NAME, MID_NAM1, MID_NAM2, MID_NAM3, SUF1, SUF2, SUF3,
CASE WHEN SUF3 IS NULL AND SUF2 IS NULL AND SUF1 IS NULL THEN MID_NAME
    WHEN SUF3 IS NOT NULL THEN NVL(MID_NAM1,'')||CASE WHEN MID_NAM2 IS NOT NULL THEN ' '||MID_NAM2 ELSE '' END
    WHEN SUF2 IS NOT NULL THEN NVL(MID_NAM1,'')||CASE WHEN MID_NAM3 IS NOT NULL THEN ' '||MID_NAM3 ELSE '' END
    WHEN SUF1 IS NOT NULL THEN NVL(MID_NAM2,'')||CASE WHEN MID_NAM3 IS NOT NULL THEN ' '||MID_NAM3 ELSE '' END
END MIDDLE_NAME,
NVL(SUF3, NVL(SUF2, SUF1)) NAME_SUFFIX_CD
FROM
(
SELECT NM.*,
(SELECT MAX(NAME_SUFFIX_CODE) FROM CRRMS_CODED.CODED_NAME_SUFFIX WHERE UPPER(REPLACE(NAME_SUFFIX_CODE,'.', '')) = UPPER(MID_NAM1)) SUF1,
(SELECT MAX(NAME_SUFFIX_CODE) FROM CRRMS_CODED.CODED_NAME_SUFFIX WHERE UPPER(REPLACE(NAME_SUFFIX_CODE,'.', '')) = UPPER(MID_NAM2)) SUF2,
(SELECT MAX(NAME_SUFFIX_CODE) FROM CRRMS_CODED.CODED_NAME_SUFFIX WHERE UPPER(REPLACE(NAME_SUFFIX_CODE,'.', '')) = UPPER(MID_NAM3)) SUF3
FROM
(
 SELECT I.*,
  trim(regexp_substr(MID_NAME,  '[^ ]+',1,1)) AS MID_NAM1,
  trim(regexp_substr(MID_NAME,  '[^ ]+',1,2)) AS MID_NAM2,
  trim(regexp_substr(MID_NAME,  '[^ ]+',1,3)) AS MID_NAM3
 FROM IDN_NAM I
) NM
)
;

Upvotes: 0

Views: 1499

Answers (2)

Ankit Mongia
Ankit Mongia

Reputation: 210

Din,

Took your 3 sample inputs and made a query according to your logic for getting Last Name, First Name , Suffix and Middle Name. For all 3 it worked fine.

select last_name,first_name,suffix
,trim(regexp_replace(input_string,'(^'||last_name||',|'||first_name||'|'||suffix||')',' ',1,0,'i')) as middle_name
from (
select substr(col,1,instr(col,',',1,1)-1) as last_name ,
trim(substr(col,instr(col,',',1,1)+1,instr(col,' ',1,1))) as first_name,
trim(regexp_substr(col,'(Jr|Sr|II|III|IV|V)',1,1,'i')) as suffix,
col as input_string
from (
select 'LST, FRST MDL1 JR MDL2 MDL3' as col from dual));

Upvotes: 0

Sentinel
Sentinel

Reputation: 6449

Here's a query that may fit the bill, though it goes a little beyond the problem definition in that it also handles prefixes:

with names(name) as (
            select 'LST, FRST MDL SR' from dual
  union all select 'LST, FRST SR MDL MDL2' from dual
  union all select 'LST, FRST MDL1 JR MDL2 MDL3' from dual
  union all select 'Jones, John Paul' from dual
  union all select 'Jones, Mr. John Jr Paul' from dual
  union all select 'Jones, John Paul Jr Henry' from dual
  union all select 'Henry, John Paul Sr' from dual
  union all select 'Masters, Lee II' from dual
)
select name
     , REGEXP_SUBSTR(name, '([^,]+), ?(((dr|rev|mr|mrs|ms|miss)[.]?) )?([^ ]+) ?(.*)',1,1,'i',3) pfx
     , REGEXP_SUBSTR(name, '([^,]+), ?(((dr|rev|mr|mrs|ms|miss)[.]?) )?([^ ]+) ?(.*)',1,1,'i',5) frst
     , rtrim(REGEXP_REPLACE(
       REGEXP_SUBSTR(name, '([^,]+), ?(((dr|rev|mr|mrs|ms|miss)[.]?) )?([^ ]+) ?(.*)',1,1,'i',6)
       ,'(^|[[:space:]])(jr|sr|ii|iii|iv|v)([[:space:]]|$)','\1',1,1,'i')) mdl
     , REGEXP_SUBSTR(name, '([^,]+), ?(((dr|rev|mr|mrs|ms|miss)[.]?) )?([^ ]+) ?(.*)',1,1,'i',1) Lst
     , REGEXP_SUBSTR(
       REGEXP_SUBSTR(name, '([^,]+), ?(((dr|rev|mr|mrs|ms|miss)[.]?) )?([^ ]+) ?(.*)',1,1,'i',6)
       ,'(^|[[:space:]])(jr|sr|ii|iii|iv|v)([[:space:]]|$)',1,1,'i',2) SFX
from names;

If you don't want the prefixes then this may work for you:

select name
     , REGEXP_SUBSTR(name, '([^,]+), ?([^ ]+) ?(.*)',1,1,'i',2) frst
     , rtrim(REGEXP_REPLACE(
       REGEXP_SUBSTR(name, '([^,]+), ?([^ ]+) ?(.*)',1,1,'i',3)
       ,'(^|[[:space:]])(jr|sr|ii|iii|iv|v)([[:space:]]|$)','\1',1,1,'i')) mdl
     , REGEXP_SUBSTR(name, '([^,]+), ?([^ ]+) ?(.*)',1,1,'i',1) Lst
     , REGEXP_SUBSTR(
       REGEXP_SUBSTR(name, '([^,]+), ?([^ ]+) ?(.*)',1,1,'i',3)
       ,'(^|[[:space:]])(jr|sr|ii|iii|iv|v)([[:space:]]|$)',1,1,'i',2) SFX
from names;

The same regular expression is used throughout to identify and extract each part of the name:

'([^,]+), ?([^ ]+) ?(.*)'
 |     |   |     |  +--+ -> 3) Middle Names (including Suffix)
 |     |   +-----+ -------> 2) First Name
 +-----+ -----------------> 1) Last Name

The First and Last Names are straight forward. The Middle Names and Suffix, however, need a little extra work to manage. To get just the middle names or suffix a second regex is needed:

'(^|[[:space:]])(jr|sr|ii|iii|iv|v)([[:space:]]|$)'

By using the above regex along with the REGEXP_REPLACE function the suffix can be removed leaving only the middle names. Similarly using the same regexp with the REGEXP_SUBSTR function the suffix itself can be retrieved.

Upvotes: 1

Related Questions