Reputation: 582
I need to remove certain keywords from an input string and return the new string. Keywords are stored in another table like MR, MRS, DR, PVT, PRIVATE, CO, COMPANY, LTD, LIMITED etc. They are two kind of keywords LEADING - MR, MRS, DR and TRAILING - PVT, PRIVATE, CO, COMPANY, LTD, LIMITED etc. So if Keywords is a LEADING then we have to remove that from the beginning and if it's a TRAILING then we have to remove that from the end. e.g.- MR Jones MRS COMPANY
should return JONES MRS
and MR MRS Jones PVT COMPANY
should return JONES
(As in first iteration MR
and PVT
will be trimmed and then word will become MRS JONES PVT
and in 2nd iteration it will become JONES
. Similarly MR MRS Doe PVT COMPANY LTD
will finally return DOE
.
I have to do it through PL/SQL. I have written the following code, but it is removing all the keywords if more than one keywords are there either at the beginning or at the end. Reason when I am looping through the cursor of keywords, if keyword which is not at the end and loop is already iterated though that then we can't reuse that keyword to replace. Note there can be n no of keywords either at the end or at the beginning:
CREATE OR REPLACE FUNCTION replace_keyword (p_in_name IN VARCHAR2)
RETURN VARCHAR2
IS
l_name VARCHAR2 (4000);
CURSOR c
IS
SELECT *
FROM RSRV_KEY_LKUPS
WHERE ACTIVE = 'Y';
BEGIN
l_name := TRIM (p_in_name);
--Now inside the function we’ll loop through this cursor something like below and replace the value in the input name:
FOR rec IN c
LOOP
IF UPPER (rec.POSITION) = 'LEADING'
AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) > 0
THEN --Rule 3:remove leading name
DBMS_OUTPUT.PUT_LINE ('Value >>' || rec.KEY_WORD);
l_name := LTRIM (UPPER (l_name), rec.KEY_WORD || ' ');
ELSIF UPPER (rec.POSITION) = 'TRAILING'
AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) > 0
THEN --Rule 4:remove trailing name
DBMS_OUTPUT.PUT_LINE ('Value >>' || rec.KEY_WORD);
l_name := RTRIM (UPPER (l_name), ' ' || rec.KEY_WORD);
END IF;
l_name := l_name;
END LOOP;
l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' '); --Remove multiple spaces in a word and replace with single blank space
l_name := TRIM (l_name); --Remove the leading and trailing blank spaces
RETURN l_name;
EXCEPTION
WHEN OTHERS
THEN
raise_application_error (
-20001,
'An error was encountered - ' || SQLCODE || ' -ERROR- ' || SQLERRM);
END;
/
Thanks a lot in advance for any help.
EDIT Sample Input 1
MR MRS Jones PVT COMPANY
Output
JONES
Sample Input 2
MR MRS Doe PVT COMPANY LTD
Output
DOE
Upvotes: 3
Views: 971
Reputation: 16915
I think that it can be done with a single query (which can be wrapped in a plsql function if you insist for some reason):
with inpt as (select 'MR Jones MRS COMPANY' text from dual)
select listagg(t1.word, ' ') within group (order by ord) new_text
from (
select w.*, words.*,
sum(case when nvl(POSITION, 'TRAILING') = 'TRAILING' then 1 else 0 end) over(order by ord rows between unbounded preceding and current row) l,
sum(case when nvl(POSITION, 'LEADING') = 'LEADING' then 1 else 0 end) over(order by ord desc rows between unbounded preceding and current row) t
from
(select regexp_substr(inpt.text, '[^ ]+',1,level) word , level ord
from inpt
connect by level <= regexp_count(inpt.text, ' ') + 1) words left outer join RSRV_KEY_LKUPS w on w.KEY_WORD = words.word
) t1
where t1.t > 0 and t1.l > 0
Edit: Explain:
The 'with' clause is just to have your input string as a column (not really necessary).
The inner select which was aliased as "words" is a known Technic for splitting words to rows (note that I kept the order with the ord
column).
Now we can left outer join the words of the input string with the keywords in your table 'RSRV_KEY_LKUPS', this will give us for every word in the input either if it should be leading or trailing or null (if it's not there)
So, so far we have (for input "MR Jones MRS COMPANY"
):
KEY_WORD POSITION WORD ORD
-----------------------------------
MR LEADING MR 1
(null) (null) Jones 2
MRS LEADING MRS 3
COMPANY TRAILING COMPANY 4
Now comes the tricky part (maybe there is a better way) - we need to somehow know which word to remove, it should be all the LEADINGs until "a change" which means until we found a null or a TRAILING (top down) and all the TRAILINGs until "a change" which means null or LEADING (bottom up). So I used a known Technic for accumulative sum, as long as we're still zeroed we need to remove the row (as soon as we got "a change" we'll have some values).
That's it, all we need to do now is recollect the rows to a new string, since 11gr2 we can use LISTAGG exactly for this
Upvotes: 1
Reputation: 4485
If you want to be sure the leading keyword is found at the beginning, you should only remove it when INSTR returns 1:
Replace
IF UPPER (rec.POSITION) = 'LEADING'
AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) > 0
with
IF UPPER (rec.POSITION) = 'LEADING'
AND INSTR (UPPER (l_name), UPPER (rec.KEY_WORD || ' '), 1) = 1
and replace
ELSIF UPPER (rec.POSITION) = 'TRAILING'
AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) > 0
by
ELSIF UPPER (rec.POSITION) = 'TRAILING'
AND INSTR (UPPER (l_name), UPPER (' ' || rec.KEY_WORD), -1) = (LENGTH(l_name)-LENGTH(rec.key_word) +1)
For the problem of multiple keywords you have to loop around your for loop:
keyword_found BOOLEAN;
LOOP
keyword_found = false;
FOR rec IN c
-- when you find a keyword
keyword_found := true;
END LOOP;
EXIT WHEN NOT(keyword_found);
END LOOP;
Upvotes: 1