user1838910
user1838910

Reputation: 23

REGEXP_SUBSTR round bracket

I want to divide a semicolon separated string into its parts with PL/SQL. It’s working fine with REGEXP_SUBSTR as long as there’s no round bracket in the string.

Example:

select REGEXP_SUBSTR('A;B;C','[^(";")]+',1,1),
REGEXP_SUBSTR('A;B;C','[^(";")]+',1,2),
REGEXP_SUBSTR('A;B;C','[^(";")]+',1,3)
from dual;

Result as expected is: A B C

The result for A;B(1);C should be A B(1) C but what I get is : A B 1

select REGEXP_SUBSTR('A;B(1);C','[^(";")]+',1,1),
REGEXP_SUBSTR('A;B(1);C','[^(";")]+',1,2),
REGEXP_SUBSTR('A;B(1);C','[^(";")]+',1,3)
from dual;

That means '(' is detected as delimiter, but I do not understand this behavior. Can someone please enlighten me?

Upvotes: 2

Views: 1286

Answers (2)

Gary_W
Gary_W

Reputation: 10360

Once again I climb on my soapbox to warn folks about the dangers of using the regex of the format '[^;]+' to parse delimited strings. Repent and be saved! It does not handle NULL elements and WILL return unexpected results. See here for more info and proof. Please use this format instead and sleep easy knowing your output is accurate:

Note the 2nd element is (NULL)

SQL> with tbl(str) as (
      select 'A;;B(1);C' from dual
    )
    select regexp_substr(str, '(.*?)(;|$)', 1, level, NULL, 1)
    from tbl
    connect by level <= regexp_count(str, ';') + 1;

REGEXP_SU
---------
A

B(1)
C

SQL>

Note the NULL returned for element 2, as expected. If you use the regex format '[^;]+' and attempt to get the 2nd element, you will get 'B(1)' which is incorrect as it is the 3rd element:

DON'T USE:

SQL> with tbl(str) as (
  2    select 'A;;B(1);C' from dual
  3  )
  4  select regexp_substr(str, '[^;]+', 1, level)
  5  from tbl
  6  connect by level <= regexp_count(str, ';') + 1;

REGEXP_SU
---------
A
B(1)
C


SQL>

Look close, the NULL is last. Imagine all the incorrect reports out there. Don't let one of them be yours!

Upvotes: 1

Srini V
Srini V

Reputation: 11375

[] is a Multilingual Regular Expression Syntax which says "Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A nonmatching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list."

For example

select REGEXP_SUBSTR('"A";"B(1)";"C"','[^";"]+',1,1)
from dual;

will return A B(1) C where [^";"]+ considers " OR ; as separator

Similarly in your example [^(";")]+ considers " OR ; OR ( OR ) as separators against your expectation.

So for your expected output, you can try

select REGEXP_SUBSTR('A;B(1);C','[^;]+',1,1),
REGEXP_SUBSTR('A;B(1);C','[^;]+',1,2),
REGEXP_SUBSTR('A;B(1);C','[^;]+',1,3)
from dual;

Upvotes: 4

Related Questions