shivesh
shivesh

Reputation: 631

Regex match pattern inside a wrapping pattern

I want match all phone numbers that are wrapped between << and >> tags.
This regex for phone numbers:

0[2349]{1}\-[1-9]{1}[0-9]{6}

I tired to add lookahead (and lookbehind) like (?=(?:>>)) but this didn't work for me.

DEMO

Upvotes: 1

Views: 5278

Answers (5)

MrFox
MrFox

Reputation: 5126

<<0[2349]{1}\-[1-9]{1}[0-9]{6}>>

Upvotes: 0

polygenelubricants
polygenelubricants

Reputation: 383876

The following seems to work (as seen on ideone.com):

Regex r = new Regex(@"(?s)<<(?:(?!>>)(?:(0[2349]\-[1-9][0-9]{6})|.))*>>");

Each <<...>> section is a Match, and all phone numbers in that section will be captured in Group[1].Captures.

Related questions


How the pattern is constructed

First of all, I simplified your phone number pattern to:

0[2349]\-[1-9][0-9]{6}

That is, the {1} is superfluous, so they get thrown away (see Using explicitly numbered repetition instead of question mark, star and plus).

Then, let's try to match each <<...>> section. Let's start at:

(?s)<<((?!>>).)*>>

This will match each <<..>> section. The .* to capture the body is guarded by a negative lookahead (?!>>), so that we don't go out of bound.

Then, instead of matching ., we give priority to matching your phone number instead. That is, we replace . with

(phonenumber|.)

Then I simply made some groups non-capturing, and the phone number captures to \1 and that's pretty much it. The fact that .NET regex stores all captures made by a group in a single match took care of the rest.

References

Upvotes: 3

svick
svick

Reputation: 244948

I think gnarf's (and Arkain's) suggestion is very sensible – you don't have to use one regex to do all the work.

But, if you really want to use one hard-to-read unportable (works only in .Net, not in other regex engines) regex, here you go:

(?<=<<(?:>?[^>])*)0[2349]{1}\-[1-9]{1}[0-9]{6}(?=(?:<?[^<])*>>)

Upvotes: 0

Jesper Fyhr Knudsen
Jesper Fyhr Knudsen

Reputation: 7937

This can easily be done with two regex patterns:

To identify the section:

<<.*>>

Use the second regex on the matches from the first:

0[2349]-[1-9]\d{6}

Remember to set dot to match new line. I know it isn't exactly what you were asking, but it will work.

Upvotes: 0

nsmyself
nsmyself

Reputation: 3565

I placed a similar question some time ago, using brackets ([]) instead of <<>>:

Link here

This should really help Cheers

Edit: It should support your demo no problem.

Upvotes: 0

Related Questions