Nick Locking
Nick Locking

Reputation: 2141

Regex issue using ICU regex to find numbers not inside parentheses

I'm trying to scan a given string for a number. The number cannot be after "v/v./vol/vol.", and cannot be inside parentheses. Here's what I have:

NSString *regex = @"(?i)(?<!v|vol|vol\\.|v\\.)\\d{1,4}(?![\\(]{0}.*\\))";
NSLog(@"Result: %@", [@"test test test 4334 test test" stringByMatching:regex]);
NSLog(@"Result: %@", [@"test test test(4334) test test" stringByMatching:regex]);
NSLog(@"Result: %@", [@"test test test(vol.4334) test test" stringByMatching:regex]);

Infuriatingly, this does not work. My regex can be separated into four parts:

(?i) - make regex case insensitive

(?<!v|vol|vol\\.|v\\.) - negative look-behind assertion for v/v./vol/vol.

\\d{1,4} - the number I'm looking for, 1-4 digits.

(?![\\(]{0}.*\\)) - negative look-ahead assertion: number cannot be preceding a ), unless there's a ( before it.

Maddeningly, if I take out the look-behind assertion, it works. What's the issue here? I'm using RegexKitLite, which uses the ICU regex syntax.

Upvotes: 2

Views: 979

Answers (2)

Nick Locking
Nick Locking

Reputation: 2141

Finally ended up with this regex:

(?i)\\d{1,4}(?<!v|vol|vol\\.|v\\.)(?![^\\(]*\\))

The negative look-behind needed to change. Passes all my tests. Thanks to Alex for identifying the positioning of my NLB being wrong.

Upvotes: 1

Alex
Alex

Reputation: 65944

Your negative lookbehind is positioned incorrectly. Lookbehind's do not modify the input position, your negative lookbehind should come after your \d{1,4} expression:

(?i)\\d{1,4}(?<!v|vol|vol\\.|v\\.)(?![\\(]{0}.*\\))

Alternatively, just use a negative lookahead to accomplish the same purpose:

(?i)(?!v|vol|vol\\.|v\\.)\\d{1,4}(?![\\(]{0}.*\\))

Upvotes: 3

Related Questions