tonyf
tonyf

Reputation: 35587

Extracting strings using Oracle REGEXP_SUBSTR

I am using REGEXP_SUBSTR in Oracle 11g and I am having difficulty trying to extract the following strings.

My query is:

SELECT regexp_substr('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', '[^CN=]*\,', 1, rownum) line
 FROM dual
 CONNECT BY LEVEL <= length('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,') - 
                     length(REPLACE('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', ',', ''))

From this query, I am having issues trying to match on exact string 'CN=' as from this query, I need the output to appear as follows:

CN=aTIGERAdmin-Admin,
CN=D0902498,
CN=ea90045052,
CN=aTIGERCall-Admin,

And in this format, with the comma at the end.

The way I am doing it at the moment is chopping off the "CN=" but I actually require this part.

Upvotes: 1

Views: 4475

Answers (2)

spencer7593
spencer7593

Reputation: 108500

I think this will return the resultset you are looking for:

SELECT REGEXP_SUBSTR(d.s,'CN=.*?,', 1, ROWNUM) line
  FROM (SELECT 'CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,'
        AS s FROM dual) d
CONNECT BY LEVEL <= LENGTH(d.s) - LENGTH(REPLACE(d.s,',',''))

The regular expression trick used here is to specify the ? modifier (following the .*) to make the match "non-greedy". The default match (without the ? modifier) is "greedy" in that it will match as much of the string as possible. In your case, you want the match to end at the first comma found. The intent here is to match literal string 'CN=' followed by any number of characters (zero, one or more) up to the first comma encountered.

This will work in Oracle 10g as well as 11g.

In 11g, the REGEXP_COUNT function can replace your "count of comman" calculation of occurrences.

CONNECT BY LEVEL <= REGEXP_COUNT(d.s,'CN=.*?,')

(BTW... by using a subquery to return the literal string, the literal string only has to be specified once. That makes it much easier to change the string for testing, rather than having to change it in multiple places.)


Addendum:

I can confirm that the comma is included in the returned value. Sample output:

LINE
-----------------------
CN=aTIGERAdmin-Admin,                                                 
CN=D0902498,                                                          
CN=ea90045052,                                                        
CN=aTIGERCall-Admin, 

Upvotes: 5

Tebbe
Tebbe

Reputation: 1372

I'm not an LDAP master, but will the regular expression CN=[^,]+ (C, then N, then equals sign, greedily followed by more than one non-comma) work for you?

Also, do you know about REGEXP_COUNT, new in 11g?

SQL> SELECT REGEXP_SUBSTR('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+', 1, ROWNUM) line
  2   FROM dual
  3   CONNECT BY LEVEL <= REGEXP_COUNT('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+')
  4  /

LINE
----------------------------------------------------------------------------------------------------
CN=aTIGERAdmin-Admin
CN=D0902498
CN=ea90045052
CN=aTIGERCall-Admin

SQL>

Upvotes: 0

Related Questions