Reputation: 3295
Lets say I have a string-
"a b cc d d e"
I'd like to be able to do
regexp_substr('a b cc d d e', ??, 1, 4)
And have it return "d d".
I was using "[^ ]+" as my pattern, but it splits single spaces which I don't want.
Upvotes: 1
Views: 879
Reputation: 6449
This one will do the trick:
select regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 4)
If you want to parse out all the discreet values you could use this:
with sample_data as (
select 'a b cc d d e' str from dual union all
select ' a b cc d dd d e' str from dual
)
, recurse(str, seq, nxt, tok) as (
select str
, 1
, regexp_instr (str, '[ ]{2,}',1,1,1)
, regexp_substr(str, '([^ ]([ ]?[^ ])*)',1, 1)
from sample_data
union all
select str
, seq+1
, regexp_instr(str, '[ ]{2,}',nxt,1,1)
, regexp_substr(str, '([^ ]([ ]?[^ ])*)',nxt, 1)
from recurse
where nxt > 0
)
select * from recurse order by str, seq;
STR SEQ NXT TOK
------------------------- ---------- ---------- -------------------------
a b cc d dd d e 1 6 a
a b cc d dd d e 2 13 b
a b cc d dd d e 3 17 cc
a b cc d dd d e 4 25 d dd d
a b cc d dd d e 5 0 e
a b cc d d e 1 5 a
a b cc d d e 2 12 b
a b cc d d e 3 16 cc
a b cc d d e 4 21 d d
a b cc d d e 5 0 e
10 rows selected
Upvotes: 0
Reputation:
Assuming you are looking for the fourth occurrence of one or more consecutive spaces, and you want to return that PLUS the surrounding non-space characters (and allowing for one or more spaces at the beginning or the end of the string, in which case the result will not have non-space characters on that side):
with
inputs ( str, occ ) as (
select 'a b cc d d e', 4 from dual
)
select regexp_substr(str, '([^ ]+|^)[ ]+', 1, occ) ||
regexp_substr(str,'[ ]+([^ ]+|$)', 1, occ, null, 1) as token
from inputs
;
Specifically, this will produce the following output for values 1, 2, 3, 4, 5 of occ
:
'a b'
, 'b cc'
, 'cc d'
, 'd d'
, 'd e'
.
Note that as soon as one match is found, the search for the next match starts at the character after the match that has already been found. In my interpretation of the problem, the non-space characters are included in overlapping candidate matches, so it is not possible (I don't think) to do this in a single pass of regexp_substr.
I input the string and the occurrence (4) in the CTE at the top; that is part of setting up test data, NOT part of the solution.
Upvotes: 0
Reputation: 1269973
I think this will do what you want:
select regexp_substr(col, '([^ ]*[ ]?[^ ]+)+', 1, 4)
Upvotes: 1