Reputation: 8079
I need a mechanism to find a number in a string that either is surrounded by non digital characters or the start/end of the string or a special delimiter (43 in this case). Here are some examples:
All of those should result in a match of 12345678
. Currently I'm using the following regular expression:
(?<=^|\D|43)([0-9]{8})(?=$|\D|43)
This expression works pretty good, but has one flaw. If the number starts with 43 but doesn't end with 43 I also get a positive result. Here are examples where I get those 'false' results:
What I need now is a construct for the regex to know if the matching string started with 43 and then only return it as a positive result if it also ends with 43.
Upvotes: 1
Views: 463
Reputation: 626845
You may use a positive lookahead check in the lookbehind:
(?<=^|\D|43(?=[0-9]{8}43))[0-9]{8}(?=43|\D|$)
^^^^^^^^^^^^^^
See the regex demo.
Now, the match will only occur if the 43
is before and after 8 random digits.
Details:
(?<=^|\D|43(?=[0-9]{8}43))
- match a location in string that is immediately
preceded with
^
- start of string\D
- a non-digit symbol43(?=[0-9]{8}43)
- 43
substring that is followed with any 8 digits and then 43
substring[0-9]{8}
- exactly 8 digits(?=43|\D|$)
- the 8 digits must be followed with:
43
- 43
digit sequence\D
- (=[^0-9]
) any non-digit symbol$
- end of string.And here is my own conditional based regex for the same task (can be used in .NET, PCRE, but not Java):
(?<=^|[^0-9]|(43))[0-9]{8}(?=(?(1)43|(?:[^0-9]|$)))
Here is a RegexStorm demo that is useful when testing out .NET regexps.
Some background info on the Conditional construct:
This language element attempts to match one of two patterns depending on whether it can match an initial pattern. Its syntax is:
(?( expression ) yes | no )
where expression is the initial pattern to match, yes is the pattern to match if expression is matched, and no is the optional pattern to match if expression is not matched. The regular expression engine treats expression as a zero-width assertion; that is, the regular expression engine does not advance in the input stream after it evaluates expression.
So, the (43)
in the lookbehind gets captured into Group 1 and then, inside the conditional, (?(1)43|(?:[^0-9]|$))
, (?(1))
checks if Group 1 was matched at all, and if yes, 43
is matched, else, (?:[^0-9]|$)
is tried (any non-digit or the end of string.
Upvotes: 3
Reputation: 43166
You can use a conditional:
(?<=^|\D|(43))[0-9]{8}(?(1)(?=43)|(?=$|\D))
The first 43
is captured in group 1, and later the conditional queries whether group 1 matched anything.
In case your regex engine doesn't support conditionals, you can try this "make your own conditional" workaround:
(?<=^|\D|()43)[0-9]{8}(?=(?:\1(?:43)|(?!\1)(?:\D|$)))
The idea is to replace a conditional (text)(?(1)a|b)
with an alternation like this: ()text(?:\1a|(?!\1)b)
Upvotes: 2