Reputation: 2559

regexp_substr skips over empty positions

With this code to return the nth value in a pipe delimited string...

regexp_substr(int_record.interfaceline, '[^|]+', 1, i)

it works fine when all values are present

Mike|Male|Yes|20000|Yes so the 3rd value is Yes (correct)

but if the string is

Mike|Male||20000|Yes, the 3rd value is 20000 (not what I want)

How can I tell the expression to not skip over the empty values?

TIA

Mike

Upvotes: 1

Answers (5)

cheshbon

Reputation: 11

I had a similar problem with a CSV file thus my separator was the semicolon (;)
So I started with an expression like the following one:

 select regexp_substr(';2;;4;', '[^;]+', 1, i) from dual

letting i iterate from 1 to 5.

And of course it didn't work either.

To get the empty parts I just say they could be at the beginning (^;), or in the middle (;;) or at the end (;$). And or-ing all of this together gives:

select regexp_substr(';2;;4;', '[^;]+|^;|;;|;$', 1, i) from dual

And believe me or not: testing for i from 1 to 5 it works!

But let's not forgot the last details: with this approach you get ; for fields that are empty originally. The next lines are showing how to get rid of them easily replacing them by empty strings(nulls):

with stage1 as (

select regexp_substr(';2;;4;', '[^;]+|^;|;;|;$', 1, 2) as F from dual

)

select case when F like '%;' then '' else F end from stage1

Upvotes: 1

Ator Poig

Reputation: 3

As an complement to @tbone response...

Oddly, my oracle didn't recognize the blank space character in this list: [^|] In this cases can be confusing and hard to realize what is going wrong. Try with this regex ([^|]| )+. Also, to detect a posible first blank item, it is better to replace the separator with the space blank before, and not after it: ' |'

trim(regexp_substr(replace('A|test||string', '|', ' |'), '([^|]| )+', 1, 4))

Upvotes: 0

migrationlogik

Reputation: 1

You can use the following :

with l as (select 'Mike|Male||20000|Yes' str from dual)
select regexp_substr(str,'(".*"|[^|]*)(\||$)',1,level,null,1) 
from dual,l 
where level=3/*use any position*/ connect by level <= regexp_count(str,'([^|]*)(\||$)')

Upvotes: 0

Srini V

Reputation: 11365

OK. This should be the best solution for you.

SELECT
      REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                    '^([^|]*\|){2}([^|]*).*$',
                    '\2' )
          TEXT
FROM
      DUAL;

So for your problem

SELECT
      REGEXP_REPLACE ( INCOMINGSTREAMOFSTRINGS,
                    '^([^|]*\|){N-1}([^|]*).*$',
                    '\2' )
          TEXT
FROM
      DUAL;

--INCOMINGSTREAMOFSTRINGS is your complete string with delimiter

--You should pass n-1 to obtain nth position

ALTERNATE 2:

WITH T AS (SELECT 'Mike|Male||20000|Yes' X FROM DUAL)
SELECT
      X,
      REGEXP_REPLACE ( X,
                    '^([^|]*).*$',
                    '\1' )
          Y1,
      REGEXP_REPLACE ( X,
                    '^[^|]*\|([^|]*).*$',
                    '\1' )
          Y2,
      REGEXP_REPLACE ( X,
                    '^([^|]*\|){2}([^|]*).*$',
                    '\2' )
          Y3,
      REGEXP_REPLACE ( X,
                    '^([^|]*\|){3}([^|]*).*$',
                    '\2' )
          Y4,
      REGEXP_REPLACE ( X,
                    '^([^|]*\|){4}([^|]*).*$',
                    '\2' )
          Y5
FROM
      T;

ALTERNATE 3:

SELECT
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   1,
                   NULL,
                   2 )
          AS FIRST,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   2,
                   NULL,
                   2 )
          AS SECOND,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   3,
                   NULL,
                   2 )
          AS THIRD,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   4,
                   NULL,
                   2 )
          AS FOURTH,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   5,
                   NULL,
                   2 )
          AS FIFTH
FROM
      DUAL;

Upvotes: 0

tbone

Reputation: 15473

The regexp_substr works this way:

If occurrence is greater than 1, then the database searches for the second occurrence beginning with the first character following the first occurrence of pattern, and so forth. This behavior is different from the SUBSTR function, which begins its search for the second occurrence at the second character of the first occurrence.

So the pattern [^|] will look for NON pipes, meaning it will skip consecutive pipes ("||") looking for a non-pipe char.

You might try:

select trim(regexp_substr(replace('A|test||string', '|', '| '), '[^|]+', 1, 4)) from dual;

This will replace a "|" with a "| " and allow you to match based on the pattern [^|]

Upvotes: 6

regexp_substr skips over empty positions

Answers (5)

Related Questions