Ankit
Ankit

Reputation: 1250

Oracle REGEX_SUBSTR Not Honoring null values

I have an issue of regex_substr not honoring the null value.

select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1)    AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2)    AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3)    AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4)    AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6)    AS pub_indctr
from dual;

If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.

Result:-

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14   P 

While it should be

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14               P 

Any suggestions ?

Upvotes: 3

Views: 6599

Answers (6)

Rajinder Nagpal
Rajinder Nagpal

Reputation: 11

I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.

function substring_specific_occurence(p_string varchar2
                                    ,p_delimiter varchar2
                                    ,p_occurence number) return varchar2
is 
    l_output varchar2(2000);
    g_miss_char     varchar2(20) := 'fdkjkjhkuhhf7';
    l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );

begin 

    while  (l_string like '%'||p_delimiter||p_delimiter||'%' )
    loop 
        l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
    end loop;

    select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence) 
    into l_output
    from dual;

    return replace(l_output,g_miss_char);

end substring_specific_occurence;

Upvotes: 1

Ankit
Ankit

Reputation: 1250

Thanks for pointing me in the right direction, I have used this to solve the issue.

SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt , REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual );

Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

Upvotes: 1

Gary_W
Gary_W

Reputation: 10360

I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
  2  from dual;

P
-

You need to do this instead I believe (works on 11g):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
  2  from dual;

P
-
P

I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls

Upvotes: 4

Patrick Bacon
Patrick Bacon

Reputation: 4660

The typical csv parsing approach is as follows:

WITH t(csv_str) AS
  ( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
  UNION ALL
  SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
  )
SELECT LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t

I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.

Explanation of the search pattern

The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:

-The pattern begins with a ','

-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.

-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.

What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.

Upvotes: 2

neshkeev
neshkeev

Reputation: 6476

You can solve your task like this:

with t(val) as (
  select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
  select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
     , substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
     , substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
     , substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
     , substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
     , substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
  from t1

     A      B   C       D       E   F
-------------------------------------
2035197553  2   S   14-JUN-14   -   P

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174756

You need to change this line,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,

to,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5)    AS phn_end_dt,
                                                   ^

[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.

Upvotes: 1

Related Questions