Reputation: 69
I have a function that compares values from STR_A
with STR_B
, the 'dream' is that any matching characters between STR_A
and STR_B
will be removed from STR_A
.
For example; STR_A = 'LTD'
and STR_B = 'LIMITED'
therefore the results will be null
.
Order is important, so if STR_A = 'LDT'
and STR_B = 'LIMITED'
the result will be 'T'
.
Another example; STR_A = 'AUSTIN'
and STR_B = 'ADVERTISING'
the result will be 'UT'
Note; the number of characters in STR_A
will always be less than or equal to STR_B
.
Also, a character can only be used once, therefore if STR_A = 'LLTD'
and STR_B = 'LIMITED'
, the result will be 'L'
.
The function I have is as follows;
create or replace FUNCTION CP_RDN_REMSTR(
S1 VARCHAR2,
S2 VARCHAR2)
RETURN VARCHAR2
IS
LEN INTEGER := NVL(LENGTH(S2),0);
OUTSTR VARCHAR2(32767) := S1;
POS INTEGER := 1;
IND INTEGER := POS;
BEGIN
FOR I IN 1..LEN
LOOP
POS := INSTR(SUBSTR(OUTSTR,POS),SUBSTR(S2,I,1));
IF POS > 0 THEN
OUTSTR := SUBSTR(OUTSTR,1,POS-IND)||SUBSTR(OUTSTR,POS+IND);
END IF;
END LOOP;
RETURN OUTSTR;
END;
However, using the above isn't giving me the desired results;
SELECT CP_RDN_REMSTR('LTD','LIMITED') AS STR_A FROM DUAL
UNION ALL
SELECT CP_RDN_REMSTR('LDT','LIMITED') AS STR_A FROM DUAL
UNION ALL
SELECT CP_RDN_REMSTR('AUSTIN','ADVERTISING') AS STR_A FROM DUAL
UNION ALL
SELECT CP_RDN_REMSTR('ADP','ADVANCED') AS STR_A FROM DUAL
The results are as follows;
('LTD','LIMITED') = NULL
('LDT','LIMITED') = 'D'
('AUSTIN','ADVERTISING') = NULL
('ADP','ADVANCED') = 'P'
The expected results are;
('LTD','LIMITED') = NULL
('LDT','LIMITED') = 'T'
('AUSTIN','ADVERTISING') = 'UT'
('ADP','ADVANCED') = 'P'
As always, many thanks in advance.
Upvotes: 3
Views: 391
Reputation:
Here is a solution using a recursive factored subquery (recursive WITH clause), available since Oracle 11.2. If your Oracle version is older, something similar can be hacked using PL/SQL code.
We inspect the letters from the LHS string, one at a time. We check to see if we can still find a match, using REGEXP_LIKE (and a recursive pattern, starting with NULL). At each step, if a match is found, we expand the pattern; if a match is not found, we expand the UNM string (for "unmatched").
There is no requirement that the LHS be shorter than the RHS, it makes no difference. The solution also handles NULL correctly on either side (I added data to test that as well).
Note the test data created in the first factored subquery - it is for testing only, it is not part of the solution.
with
inputs ( lhs, rhs ) as (
select 'LTD' ,'LIMITED' from dual union all
select 'LDT' ,'LIMITED' from dual union all
select 'AUSTIN','ADVERTISING' from dual union all
select 'ADP' ,'ADVANCED' from dual union all
select 'ALPHA' , null from dual union all
select null ,'BETA' from dual
),
r ( lvl, lhs, rhs, unm, pattrn, next_letter ) as (
select 1, lhs, rhs, null, null, substr(lhs, 1, 1)
from inputs
union all
select lvl + 1, lhs, rhs,
unm || case when regexp_like(rhs, pattrn || '.*' || next_letter)
then null else next_letter end,
pattrn || case when regexp_like(rhs, pattrn || '.*' || next_letter)
then '.*' || next_letter end,
substr(lhs, lvl + 1, 1)
from r
where next_letter is not null
)
cycle lvl set cycle to 1 default 0
select lhs, rhs, unm
from r
where next_letter is null
;
Output (from the simulated inputs):
LHS RHS UNM
------ ----------- -----
BETA
LTD LIMITED
LDT LIMITED T
ADP ADVANCED P
ALPHA ALPHA
AUSTIN ADVERTISING UT
Note: Here is one way to see what the recursive query does. At the very end of the solution, I just selected the columns and the rows I needed from r
. Instead, replace those last few lines with
select *
from r
order by lhs, lvl
;
and look at THAT output.
EDIT
Fun problem. Here is a solution that should work quite a bit faster than the first version. It uses only standard string functions (INSTR and SUBSTR), which are quite a bit faster than regular expressions. Moreover, working from left to right, letters that have already been inspected don't need to be carried forward (resulting in longer and more complex matching pattern); instead, it is possible to check just one letter at a time, and to chop off initial segments from the RHS string at each step, making each search shorter.
with
inputs ( lhs, rhs ) as (
select 'LTD' ,'LIMITED' from dual union all
select 'LDT' ,'LIMITED' from dual union all
select 'AUSTIN','ADVERTISING' from dual union all
select 'ADP' ,'ADVANCED' from dual union all
select 'ALPHA' , null from dual union all
select null ,'BETA' from dual
),
r ( lhs, rhs, next_letter, unm, new_lhs, new_rhs ) as (
select lhs, rhs, substr(lhs, 1, 1), null, lhs, rhs
from inputs
union all
select lhs, rhs, substr(new_lhs, 2, 1),
unm || case when nvl(instr(new_rhs, next_letter), 0) = 0
then next_letter end,
substr(new_lhs, 2),
substr(new_rhs, nvl(instr(new_rhs, next_letter), 0) + 1)
from r
where next_letter is not null
)
cycle new_lhs set cycle to 1 default 0
select lhs, rhs, unm
from r
where next_letter is null
;
Upvotes: 2
Reputation: 4962
Based on your comment about order being important, it sounds like you want to loop through each character in S2
and only remove it from S1
if it's the leading character of S1
.
The following satisfies your first, second and fourth examples, but causes the third example to not make sense:
CREATE OR REPLACE FUNCTION CP_RDN_REMSTR (S1 VARCHAR2, S2 VARCHAR2) RETURN VARCHAR2
IS
LEN INTEGER := NVL (LENGTH (S2), 0);
OUTSTR VARCHAR2 (32767) := S1;
POS INTEGER := 1;
IND INTEGER := POS;
BEGIN
FOR I IN 1 .. LEN
LOOP
POS := 0;
POS := INSTR (SUBSTR (OUTSTR, POS), SUBSTR (S2, I, 1));
IF POS = 1 THEN
OUTSTR := SUBSTR (OUTSTR, 1, POS - IND) || SUBSTR (OUTSTR, POS + IND);
END IF;
END LOOP;
RETURN OUTSTR;
END CP_RDN_REMSTR;
Output:
('LTD','LIMITED') = NULL
('LDT','LIMITED') = 'T'
('AUSTIN','ADVERTISING') = 'USTIN'
('ADP','ADVANCED') = 'P'
Upvotes: 1