user1191027
user1191027

Reputation:

Regex Multiple Negative Lookahead

Here is my regex pattern: [Ss]ection\s\d+(?![a-zA-z])(?!</ref>)

For example, it should match: section 5 or section 50

For example, it should not match: section 5A or section 5</ref> or section 5A</ref> or section 50A

Problem is that in reality it matches them wrong: http://regexr.com?33ien

Not sure what's wrong with the pattern though...

Upvotes: 3

Views: 7206

Answers (3)

Pshemo
Pshemo

Reputation: 124215

Maybe try [Ss]ection\s\d++(?![a-zA-z])(?!</ref>). ++ is possessive quantifier. This quantifier is similar to greedy quantifier except it blocks fragment of string that it matched from being used by later part of regex.

Example

System.out.println("ababab".matches("(ab)++ab")); 
// prints false since last "ab" is possessed by (ab)++ 

Upvotes: 8

Anton Kovalenko
Anton Kovalenko

Reputation: 21507

This one should work:

[Ss]ection\s\d+(?!\d)(?![a-zA-z])(?!</ref>)

I've explained a problem with our thinking about regexp lookaheads at Strangeness with negative lookahead assertion in Java regular expression, it's applicable here as well.

The situation here is slightly different: negative lookahead does match when we don't want it to, because the matcher is inclined to accept shorter match for the pre-lookahead part if it helps matching expression as a whole. That's why it's important to have an idea of input boundary if you use lookahead: be it a word boundary, an anchor $, or some assertion about the following text (not looking at a digit in my proposed solution).

Upvotes: 1

Pilou
Pilou

Reputation: 1478

The matches are not wrong : in your regex you want "section " followed by one or more digits not followed by some text or ""

Thats true for section 50A :

section 5 is followed by 0A and thats not in your negative lookahead.

You can do something like :

[Ss]ection\s\d+(?![a-zA-Z0-9])(?!</ref>)

Upvotes: 2

Related Questions