Sid
Sid

Reputation: 582

String removal in Oracle PL/SQL

I need to remove certain keywords from an input string and return the new string. Keywords are stored in another table like MR, MRS, DR, PVT, PRIVATE, CO, COMPANY, LTD, LIMITED etc. They are two kind of keywords LEADING - MR, MRS, DR and TRAILING - PVT, PRIVATE, CO, COMPANY, LTD, LIMITED etc. So if Keywords is a LEADING then we have to remove that from the beginning and if it's a TRAILING then we have to remove that from the end. e.g.- MR Jones MRS COMPANY should return JONES MRS and MR MRS Jones PVT COMPANY should return JONES (As in first iteration MR and PVT will be trimmed and then word will become MRS JONES PVT and in 2nd iteration it will become JONES. Similarly MR MRS Doe PVT COMPANY LTD will finally return DOE.

I have to do it through PL/SQL. I have written the following code, but it is removing all the keywords if more than one keywords are there either at the beginning or at the end. Reason when I am looping through the cursor of keywords, if keyword which is not at the end and loop is already iterated though that then we can't reuse that keyword to replace. Note there can be n no of keywords either at the end or at the beginning:

CREATE OR REPLACE FUNCTION replace_keyword (p_in_name IN VARCHAR2)
   RETURN VARCHAR2
IS
   l_name   VARCHAR2 (4000);

   CURSOR c
   IS
      SELECT *
        FROM RSRV_KEY_LKUPS
       WHERE ACTIVE = 'Y';
BEGIN
    l_name := TRIM (p_in_name); 

   --Now inside the function we’ll loop through this cursor something like below and replace the value in the input name:

   FOR rec IN c
   LOOP
      IF     UPPER (rec.POSITION) = 'LEADING'
         AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) > 0
      THEN                                        --Rule 3:remove leading name
         DBMS_OUTPUT.PUT_LINE ('Value >>' || rec.KEY_WORD);
         l_name := LTRIM (UPPER (l_name), rec.KEY_WORD || ' ');

      ELSIF     UPPER (rec.POSITION) = 'TRAILING'
            AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) > 0
      THEN                                       --Rule 4:remove trailing name
         DBMS_OUTPUT.PUT_LINE ('Value >>' || rec.KEY_WORD);
         l_name := RTRIM (UPPER (l_name), ' ' || rec.KEY_WORD);      
      END IF;

      l_name := l_name;
   END LOOP;

   l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' '); --Remove multiple spaces in a word and replace with single blank space
   l_name := TRIM (l_name); --Remove the leading and trailing blank spaces
   RETURN l_name;
EXCEPTION
   WHEN OTHERS
   THEN
      raise_application_error (
         -20001,
         'An error was encountered - ' || SQLCODE || ' -ERROR- ' || SQLERRM);
END;
/

Thanks a lot in advance for any help.

EDIT Sample Input 1

MR MRS Jones PVT COMPANY 

Output

JONES

Sample Input 2

MR MRS Doe PVT COMPANY LTD 

Output

DOE

Upvotes: 3

Views: 971

Answers (2)

A.B.Cade
A.B.Cade

Reputation: 16915

I think that it can be done with a single query (which can be wrapped in a plsql function if you insist for some reason):

Here is a sqlfiddle demo

with inpt as (select 'MR Jones MRS COMPANY' text from dual)
select listagg(t1.word, ' ') within group (order by ord) new_text 
from (
select w.*, words.*, 
sum(case when nvl(POSITION, 'TRAILING') = 'TRAILING'  then 1 else 0 end) over(order by ord rows between unbounded preceding and current row) l,
sum(case when nvl(POSITION, 'LEADING') = 'LEADING' then 1 else 0 end) over(order by ord desc rows between unbounded preceding and current row) t
from 
(select regexp_substr(inpt.text, '[^ ]+',1,level) word , level ord 
from inpt 
connect by level <= regexp_count(inpt.text, ' ') + 1) words left outer join RSRV_KEY_LKUPS w on w.KEY_WORD = words.word
 ) t1
where t1.t > 0 and t1.l > 0

Edit: Explain:

The 'with' clause is just to have your input string as a column (not really necessary).

The inner select which was aliased as "words" is a known Technic for splitting words to rows (note that I kept the order with the ord column).

Now we can left outer join the words of the input string with the keywords in your table 'RSRV_KEY_LKUPS', this will give us for every word in the input either if it should be leading or trailing or null (if it's not there)

So, so far we have (for input "MR Jones MRS COMPANY"):

KEY_WORD    POSITION    WORD    ORD 
----------------------------------- 
MR          LEADING     MR      1  
(null)      (null)      Jones   2  
MRS         LEADING     MRS     3  
COMPANY     TRAILING    COMPANY 4 

Now comes the tricky part (maybe there is a better way) - we need to somehow know which word to remove, it should be all the LEADINGs until "a change" which means until we found a null or a TRAILING (top down) and all the TRAILINGs until "a change" which means null or LEADING (bottom up). So I used a known Technic for accumulative sum, as long as we're still zeroed we need to remove the row (as soon as we got "a change" we'll have some values).

That's it, all we need to do now is recollect the rows to a new string, since 11gr2 we can use LISTAGG exactly for this

Upvotes: 1

Conffusion
Conffusion

Reputation: 4485

If you want to be sure the leading keyword is found at the beginning, you should only remove it when INSTR returns 1:

Replace

IF UPPER (rec.POSITION) = 'LEADING'
   AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) > 0

with

IF UPPER (rec.POSITION) = 'LEADING'
   AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) = 1

and replace

  ELSIF     UPPER (rec.POSITION) = 'TRAILING'
        AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) > 0

by

  ELSIF UPPER (rec.POSITION) = 'TRAILING'
        AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) = (LENGTH(l_name)-LENGTH(rec.key_word) +1)

For the problem of multiple keywords you have to loop around your for loop:

keyword_found BOOLEAN;
LOOP
  keyword_found = false;
  FOR rec IN c
       -- when you find a keyword
       keyword_found := true;
  END LOOP;
  EXIT WHEN NOT(keyword_found);
END LOOP;

Upvotes: 1

Related Questions