Optimizing Regexp in Oracle

Question

I have a working script:

Select col from table where regexp_like (col,'^noun[ |s |es ]| noun[ |s |es ]|noun[ |s |es ]$','i');

Can I optimize my three blocks in REGEXP to a shorter form?

Good:
noun abcd
nouns abcd
abcd noun abcd
abcd nounes abcd
abcd noun

Wrong:
nounse abcd
abcd anouns abcd
abcd inoun

LukStorms · Accepted Answer

In most regex engines a word boundary \b can be used to get the seperate words.
But in Oracle regex you need to do that differently.

(^|\s)noun(e?s)?(\s|$)

(^|\s) : start of string or whitespace
(e?s)? : optional group that has 'es' or 's'
(\s|$) : whitespace or end of string

Setting up test data:

create table test_table (id number(8,0), col varchar2(30), matchexpected char(1));

insert into test_table (id, col, matchexpected) values (1,'noun abcd','Y');
insert into test_table (id, col, matchexpected) values (2,'nouns abcd','Y');
insert into test_table (id, col, matchexpected) values (3,'abcd NOUN abcd','Y');
insert into test_table (id, col, matchexpected) values (4,'abcd nounEs abcd','Y');
insert into test_table (id, col, matchexpected) values (5,'abcd noun','Y');

insert into test_table (id, col, matchexpected) values (6,'nounse abcd','N');
insert into test_table (id, col, matchexpected) values (7,'abcd anouns abcd','N');
insert into test_table (id, col, matchexpected) values (8,'abcd inoun','N');

Example Query:

select * 
from test_table
where regexp_like (col,'(^|\s)noun(e?s)?(\s|$)','i');

Or use \W (non-word character: [^A-Za-z0-9_]) in the regex. Instead of \s (whitespace). To match also strings like "abc nounes!".

select * 
from test_table
where regexp_like (col,'(^|\W)noun(e?s)?(\W|$)','i');

Result:
The first 5 id's.

Optimizing Regexp in Oracle

Answers (1)

Related Questions