Reputation:
Here is my regex pattern: [Ss]ection\s\d+(?![a-zA-z])(?!</ref>)
For example, it should match: section 5
or section 50
For example, it should not match: section 5A
or section 5</ref>
or section 5A</ref>
or section 50A
Problem is that in reality it matches them wrong: http://regexr.com?33ien
Not sure what's wrong with the pattern though...
Upvotes: 3
Views: 7206
Reputation: 124215
Maybe try [Ss]ection\s\d++(?![a-zA-z])(?!</ref>)
. ++ is possessive quantifier. This quantifier is similar to greedy quantifier except it blocks fragment of string that it matched from being used by later part of regex.
Example
System.out.println("ababab".matches("(ab)++ab"));
// prints false since last "ab" is possessed by (ab)++
Upvotes: 8
Reputation: 21507
This one should work:
[Ss]ection\s\d+(?!\d)(?![a-zA-z])(?!</ref>)
I've explained a problem with our thinking about regexp lookaheads at Strangeness with negative lookahead assertion in Java regular expression, it's applicable here as well.
The situation here is slightly different: negative lookahead does match when we don't want it to, because the matcher is inclined to accept shorter match for the pre-lookahead part if it helps matching expression as a whole. That's why it's important to have an idea of input boundary if you use lookahead: be it a word boundary, an anchor $
, or some assertion about the following text (not looking at a digit in my proposed solution).
Upvotes: 1
Reputation: 1478
The matches are not wrong : in your regex you want "section " followed by one or more digits not followed by some text or ""
Thats true for section 50A
:
section 5
is followed by 0A
and thats not in your negative lookahead.
You can do something like :
[Ss]ection\s\d+(?![a-zA-Z0-9])(?!</ref>)
Upvotes: 2