Kyle H
Kyle H

Reputation: 3295

Oracle Regexp - substr by multiple spaces but not single space

Lets say I have a string-

"a   b      cc  d d  e"

I'd like to be able to do

regexp_substr('a   b      cc  d d  e', ??, 1, 4)

And have it return "d d".

I was using "[^ ]+" as my pattern, but it splits single spaces which I don't want.

Upvotes: 1

Views: 879

Answers (3)

Sentinel
Sentinel

Reputation: 6449

This one will do the trick:

select regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 4)

If you want to parse out all the discreet values you could use this:

with sample_data as (
  select 'a   b      cc  d d  e' str from dual union all
  select ' a   b      cc  d dd d  e' str from dual
)
, recurse(str, seq, nxt, tok) as (
select str
     , 1
     , regexp_instr (str, '[ ]{2,}',1,1,1)
     , regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 1)
 from sample_data
union all
select str
     , seq+1
     , regexp_instr(str, '[ ]{2,}',nxt,1,1)
     , regexp_substr(str, '([^ ]([ ]?[^ ])*)',nxt, 1)
 from recurse
 where nxt > 0
)
select * from recurse order by str, seq;

STR                              SEQ        NXT TOK                     
------------------------- ---------- ---------- -------------------------
 a   b      cc  d dd d  e          1          6 a                        
 a   b      cc  d dd d  e          2         13 b                        
 a   b      cc  d dd d  e          3         17 cc                       
 a   b      cc  d dd d  e          4         25 d dd d                   
 a   b      cc  d dd d  e          5          0 e                        

a   b      cc  d d  e              1          5 a                        
a   b      cc  d d  e              2         12 b                        
a   b      cc  d d  e              3         16 cc                       
a   b      cc  d d  e              4         21 d d                      
a   b      cc  d d  e              5          0 e                        

 10 rows selected 

Upvotes: 0

user5683823
user5683823

Reputation:

Assuming you are looking for the fourth occurrence of one or more consecutive spaces, and you want to return that PLUS the surrounding non-space characters (and allowing for one or more spaces at the beginning or the end of the string, in which case the result will not have non-space characters on that side):

with
     inputs ( str, occ ) as (
       select 'a   b      cc  d d  e', 4 from dual
     )
select regexp_substr(str, '([^ ]+|^)[ ]+', 1, occ) || 
          regexp_substr(str,'[ ]+([^ ]+|$)', 1, occ, null, 1) as token
from inputs
;

Specifically, this will produce the following output for values 1, 2, 3, 4, 5 of occ:

'a b', 'b cc', 'cc d', 'd d', 'd e'.

Note that as soon as one match is found, the search for the next match starts at the character after the match that has already been found. In my interpretation of the problem, the non-space characters are included in overlapping candidate matches, so it is not possible (I don't think) to do this in a single pass of regexp_substr.

I input the string and the occurrence (4) in the CTE at the top; that is part of setting up test data, NOT part of the solution.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269973

I think this will do what you want:

select regexp_substr(col, '([^ ]*[ ]?[^ ]+)+', 1, 4)

Upvotes: 1

Related Questions