devoured elysium
devoured elysium

Reputation: 105107

Retrivieng specific occurrences of a given Regex with Oracle SQL

In a simplified form, I'm attempting to retrieve either the first occurrence of the '.*?=(.*?);.*' regex, or the second, or the third -- that is, either x or y or z (that is, I want to be able to hardcode in this query that I want the first, second or third values) in this following example:

select regexp_replace(
  'margin=x;margin=y;margin=z;',
  '.*?=(.*?);.*',
  '\1',
  1 -- occurrences. I thought that picking 1, 2 or 3 would solve my problem?
) from dual;

-- This returns "xyz", which is terrible. I was expecting it to return "x", in this case.

Looking at the Oracle documentation, I thought this would be relatively straightforward, as the last parameter (occurrences), apparently allows me to select which groups to take into consideration. But it doesn't! Why?

Thanks

Upvotes: 0

Views: 58

Answers (3)

user5683823
user5683823

Reputation:

Perhaps all you need is this.... The fourth parameter is NOT the occurrence but the POSITION from which the search starts. The FIFTH parameter is the occurrence.

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm

Also, are you sure you want REPLACE and not SUBSTR?

EDITED: To clarify (it seems at least one person was confused). I show a possible solution to what you need (perhaps) at the end, but first let's look at REGEXP_REPLACE. I rewrote your query to use different occurrences; I put the index in a CTE, but you can instead make idx into a bind variable, or any other mechanism you need to use. As you will see, the output makes no sense.

with t1 ( idx ) as (select 1 from dual union all select 2 from dual 
                                       union all select 3 from dual)
select idx, 
       regexp_replace('margin=x;margin=y;margin=z;', '.*?=(.*?);.*', '\1', 1, idx) as val 
from   t1;

Output:

       IDX VAL
---------- -----------------------
         1 xmargin=y;margin=z;
         2 margin=x;ymargin=z;
         3 margin=x;margin=y;z

3 rows selected.

I guess this is not what you needed - but it demonstrates what was wrong in your query. The fourth argument to REGEXP_REPLACE, 1 in all cases in the above query, is the position from which the search begins. The fifth argument, idx, is the occurrence. This query replaces the first, second, third occurrence with the subexpression - probably not what you wanted.

If you need to extract x, or y, or z, depending on the occurrence number, you must use REGEXP_SUBSTR, not REGEXP_REPLACE. Note also that I changed the match pattern - the .*? at the beginning and the .* at the end are unnecessary. If you want to find x, y or z in something like margin=x; but not in length=x; then you must make that explicit, the match pattern should be 'margin=(.*?);'.

with t1 ( idx ) as (select 1 from dual union all select 2 from dual 
                                       union all select 3 from dual)
select idx, 
       regexp_replace('margin=x;margin=y;margin=z;', '=(.*?);', '\1', 1, idx) as val
from   t1;

Output:

       IDX VAL
---------- -------
         1 x
         2 y
         3 z

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You need a regex that will match 1 to n occurrences of the whole group. E.g.

([^=]*=([^;]*);){2}.*

(replaced with \2 backreference) will get the 2nd attribute value. Your regex can also be used (though it is quite synonymous to the above pattern): (.*?=(.*?);){2}.*.

See the regex demo

If you define the index variable as IDX, you can use something like

select regexp_replace(
  'margin=x;margin=y;margin=z;',
   CONCAT('([^=]*=([^;]*);){', IDX, '}.*'),
  '\2'
) from dual;

NOTE: If you want to get an empty string as a result of trying to obtain a non-existing value, add |.* at the end of the regex:

(.*?=(.*?);){4}.*|.*

See this regex demo (with your input string, the result will be empty string).

Upvotes: 1

SomeJavaGuy
SomeJavaGuy

Reputation: 7357

i´m goingoff to another completly different solution. Would combining a hierarchial substring select with a regexp_replace be an option for your needs?

This way you could create an option to either select one or multiple values, depending on your needs. You wouldn´t need to write a concatinating regex value and you could adjust the select a bit more to your needs

select regexp_replace(subselect.val, '.*=(.*?);', '\1') -- remove "margin="
from (select regexp_substr(
             'margin=x;margin=y;margin=z;',
             '.*?=(.*?);',
             1,
             level) val,
             level lvl
      from dual
      connect by regexp_substr('margin=x;margin=y;margin=z;',
                              '.*?=(.*?);',
                              1,
                              level) is not null) subselect -- This select represents each margin=T as a single row
where lvl = 1; -- cou could define multiple values to select aswell.

Upvotes: 1

Related Questions