MAndrews
MAndrews

Reputation: 69

Oracle PL/SQL - String Match and Removal Function

I have a function that compares values from STR_A with STR_B, the 'dream' is that any matching characters between STR_A and STR_B will be removed from STR_A.

For example; STR_A = 'LTD' and STR_B = 'LIMITED' therefore the results will be null.

Order is important, so if STR_A = 'LDT' and STR_B = 'LIMITED' the result will be 'T'.

Another example; STR_A = 'AUSTIN' and STR_B = 'ADVERTISING' the result will be 'UT'

Note; the number of characters in STR_A will always be less than or equal to STR_B.

Also, a character can only be used once, therefore if STR_A = 'LLTD' and STR_B = 'LIMITED', the result will be 'L'.

The function I have is as follows;

create or replace FUNCTION CP_RDN_REMSTR(
  S1 VARCHAR2,
  S2 VARCHAR2)
 RETURN VARCHAR2
IS
 LEN INTEGER := NVL(LENGTH(S2),0);
 OUTSTR VARCHAR2(32767) := S1;
 POS INTEGER := 1;
 IND INTEGER := POS;
BEGIN
 FOR I IN 1..LEN
 LOOP
  POS := INSTR(SUBSTR(OUTSTR,POS),SUBSTR(S2,I,1));
  IF POS > 0 THEN
   OUTSTR := SUBSTR(OUTSTR,1,POS-IND)||SUBSTR(OUTSTR,POS+IND);
  END IF;
 END LOOP;
RETURN OUTSTR;
END;

However, using the above isn't giving me the desired results;

SELECT CP_RDN_REMSTR('LTD','LIMITED') AS STR_A FROM DUAL
UNION ALL
SELECT CP_RDN_REMSTR('LDT','LIMITED') AS STR_A FROM DUAL
UNION ALL
SELECT CP_RDN_REMSTR('AUSTIN','ADVERTISING') AS STR_A FROM DUAL
UNION ALL
SELECT CP_RDN_REMSTR('ADP','ADVANCED') AS STR_A FROM DUAL

The results are as follows;

('LTD','LIMITED')        = NULL
('LDT','LIMITED')        = 'D'
('AUSTIN','ADVERTISING') = NULL
('ADP','ADVANCED')       = 'P'

The expected results are;

('LTD','LIMITED')        = NULL
('LDT','LIMITED')        = 'T'
('AUSTIN','ADVERTISING') = 'UT'
('ADP','ADVANCED')       = 'P'

As always, many thanks in advance.

Upvotes: 3

Views: 391

Answers (2)

user5683823
user5683823

Reputation:

Here is a solution using a recursive factored subquery (recursive WITH clause), available since Oracle 11.2. If your Oracle version is older, something similar can be hacked using PL/SQL code.

We inspect the letters from the LHS string, one at a time. We check to see if we can still find a match, using REGEXP_LIKE (and a recursive pattern, starting with NULL). At each step, if a match is found, we expand the pattern; if a match is not found, we expand the UNM string (for "unmatched").

There is no requirement that the LHS be shorter than the RHS, it makes no difference. The solution also handles NULL correctly on either side (I added data to test that as well).

Note the test data created in the first factored subquery - it is for testing only, it is not part of the solution.

with
     inputs ( lhs, rhs ) as (
       select 'LTD'   ,'LIMITED'     from dual union all
       select 'LDT'   ,'LIMITED'     from dual union all
       select 'AUSTIN','ADVERTISING' from dual union all
       select 'ADP'   ,'ADVANCED'    from dual union all
       select 'ALPHA' , null         from dual union all
       select null    ,'BETA'        from dual
     ),
     r ( lvl, lhs, rhs, unm, pattrn, next_letter ) as (
       select  1, lhs, rhs, null, null, substr(lhs, 1, 1)
         from  inputs
       union all
       select  lvl + 1, lhs, rhs,
               unm    || case when regexp_like(rhs, pattrn || '.*' || next_letter)
                              then null else next_letter end,
               pattrn || case when regexp_like(rhs, pattrn || '.*' || next_letter)
                              then '.*' || next_letter end,
               substr(lhs, lvl + 1, 1)
         from  r
         where next_letter is not null
     )
     cycle lvl set cycle to 1 default 0
select lhs, rhs, unm
from   r
where  next_letter is null
;

Output (from the simulated inputs):

LHS     RHS          UNM  
------  -----------  -----
        BETA                             
LTD     LIMITED                          
LDT     LIMITED      T                   
ADP     ADVANCED     P                   
ALPHA                ALPHA               
AUSTIN  ADVERTISING  UT   

Note: Here is one way to see what the recursive query does. At the very end of the solution, I just selected the columns and the rows I needed from r. Instead, replace those last few lines with

select   *
from     r
order by lhs, lvl
;

and look at THAT output.

EDIT

Fun problem. Here is a solution that should work quite a bit faster than the first version. It uses only standard string functions (INSTR and SUBSTR), which are quite a bit faster than regular expressions. Moreover, working from left to right, letters that have already been inspected don't need to be carried forward (resulting in longer and more complex matching pattern); instead, it is possible to check just one letter at a time, and to chop off initial segments from the RHS string at each step, making each search shorter.

with
     inputs ( lhs, rhs ) as (
       select 'LTD'   ,'LIMITED'     from dual union all
       select 'LDT'   ,'LIMITED'     from dual union all
       select 'AUSTIN','ADVERTISING' from dual union all
       select 'ADP'   ,'ADVANCED'    from dual union all
       select 'ALPHA' , null         from dual union all
       select null    ,'BETA'        from dual
     ),
     r ( lhs, rhs, next_letter, unm, new_lhs, new_rhs ) as (
       select  lhs, rhs, substr(lhs, 1, 1), null, lhs, rhs
         from  inputs
       union all
       select  lhs, rhs, substr(new_lhs, 2, 1),
               unm || case when nvl(instr(new_rhs, next_letter), 0) = 0 
                           then next_letter end,
               substr(new_lhs, 2),
               substr(new_rhs, nvl(instr(new_rhs, next_letter), 0) + 1) 
         from  r
         where next_letter is not null
    )
    cycle new_lhs set cycle to 1 default 0
select lhs, rhs, unm
from   r
where  next_letter is null
;

Upvotes: 2

wweicker
wweicker

Reputation: 4962

Based on your comment about order being important, it sounds like you want to loop through each character in S2 and only remove it from S1 if it's the leading character of S1.

The following satisfies your first, second and fourth examples, but causes the third example to not make sense:

CREATE OR REPLACE FUNCTION CP_RDN_REMSTR (S1 VARCHAR2, S2 VARCHAR2) RETURN VARCHAR2
IS
   LEN      INTEGER := NVL (LENGTH (S2), 0);
   OUTSTR   VARCHAR2 (32767) := S1;
   POS      INTEGER := 1;
   IND      INTEGER := POS;
BEGIN
   FOR I IN 1 .. LEN
   LOOP
      POS := 0;
      POS := INSTR (SUBSTR (OUTSTR, POS), SUBSTR (S2, I, 1));

      IF POS = 1 THEN
         OUTSTR := SUBSTR (OUTSTR, 1, POS - IND) || SUBSTR (OUTSTR, POS + IND);
      END IF;
   END LOOP;

   RETURN OUTSTR;
END CP_RDN_REMSTR;

Output:

('LTD','LIMITED')        = NULL 
('LDT','LIMITED')        = 'T'
('AUSTIN','ADVERTISING') = 'USTIN' 
('ADP','ADVANCED')       = 'P'

Upvotes: 1

Related Questions