Vincent Dagpin
Vincent Dagpin

Reputation: 3611

Regex Matching defined tags in c#

i have this string

 This is a <143>sample</143> regex <143>pa<665>t</665>tern</143> selection <143>by</143> tags in <128>c#</128> and my <132>name</132> is <175>Vincent</175>

and i supposed to just get the match by tags. im using it for highlighting text.

Expected output:

<143>sample</143>
<143>pa<665>t</665>tern</143>
<665>t</665>
<143>by</143>
<128>c#</128>
<132>name</132>
<175>Vincent</175>

i tried this regex pattern:

<(143|128|132|175)>.*</(143|128|132|175)> 

but it will print all the result as match, the whole string.

any help please.


Follow up question

instead of getting the whole line of match, can i get the text inside the tag alone? like i just get sample instead of <143>sample<`/143>

Upvotes: 1

Views: 391

Answers (3)

Oybek
Oybek

Reputation: 7243

As it was said, you should use the lazy matching here. It is achieved by appending ? to your quantifier. In your case it is *.

Further, in order to simplify your work you could use the named capture. It is fully supported in the .NET. Here is a sample code

var target = @"This is a <143>sample</143> regex <143>pattern</143> selection <143>by</143> tags in <128>c#</128> and my <132>name</132> is <175>Vincent</175>";
var pattern = new Regex("<(143|128|132|175)>(?<Content>.*?)</\\1>", RegexOptions.IgnoreCase);
var result = pattern.Matches(target);
for (var j = 0; j < result.Count; j++) {
    var capts = result[j].Groups["Content"].Captures;
    for (var i = 0; i < capts.Count; i++) {
        Console.WriteLine(capts[i].Value);
    }
}

Upvotes: 1

barsju
barsju

Reputation: 4446

It's because the .* is greedy..

You can either make it non-greedy by adding a ?: .*?

or

You can make it match anything but the '>': [^<]*

I usually go for the last one cause it is easier to remember and works in most cases..

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838256

These two changes should help you:

Try this:

<(143|128|132|175)>.*?</\1> 

Regarding "ah yeah i forgot.. it is nested tags": then it's probably not wise to use regular expressions. Nested tags isn't a regular langauge.

Upvotes: 5

Related Questions