Reputation: 631
I want match all phone numbers that are wrapped between << and >> tags.
This regex for phone numbers:
0[2349]{1}\-[1-9]{1}[0-9]{6}
I tired to add lookahead (and lookbehind) like (?=(?:>>))
but this didn't work for me.
Upvotes: 1
Views: 5278
Reputation: 383876
The following seems to work (as seen on ideone.com):
Regex r = new Regex(@"(?s)<<(?:(?!>>)(?:(0[2349]\-[1-9][0-9]{6})|.))*>>");
Each <<...>>
section is a Match
, and all phone numbers in that section will be captured in Group[1].Captures
.
First of all, I simplified your phone number pattern to:
0[2349]\-[1-9][0-9]{6}
That is, the {1}
is superfluous, so they get thrown away (see Using explicitly numbered repetition instead of question mark, star and plus).
Then, let's try to match each <<...>>
section. Let's start at:
(?s)<<((?!>>).)*>>
This will match each <<..>>
section. The .*
to capture the body is guarded by a negative lookahead (?!>>)
, so that we don't go out of bound.
Then, instead of matching .
, we give priority to matching your phone number instead. That is, we replace .
with
(phonenumber|.)
Then I simply made some groups non-capturing, and the phone number captures to \1
and that's pretty much it. The fact that .NET regex stores all captures made by a group in a single match took care of the rest.
Upvotes: 3
Reputation: 244948
I think gnarf's (and Arkain's) suggestion is very sensible – you don't have to use one regex to do all the work.
But, if you really want to use one hard-to-read unportable (works only in .Net, not in other regex engines) regex, here you go:
(?<=<<(?:>?[^>])*)0[2349]{1}\-[1-9]{1}[0-9]{6}(?=(?:<?[^<])*>>)
Upvotes: 0
Reputation: 7937
This can easily be done with two regex patterns:
To identify the section:
<<.*>>
Use the second regex on the matches from the first:
0[2349]-[1-9]\d{6}
Remember to set dot to match new line. I know it isn't exactly what you were asking, but it will work.
Upvotes: 0