sam_33
sam_33

Reputation: 595

Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it.

I have regular expression as:

Regex myregex = new Regex("testValue=\"(.+?)\"");

What does (.+?) indicate?

The string it matches is "testValue=123e4567" and returns 123e4567 as output.

Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. How do I write a regular expression for it?

Upvotes: 1

Views: 157

Answers (3)

Ken Redler
Ken Redler

Reputation: 23943

This means:

(   Begin captured group
.   Match any character
+   One or more times
?   Non-greedy quantifier
)   End captured group

In the case of your regex, the non-greedy quantifier ? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. If it were greedy (without the ?), the group would extend to the very last double-quote it encounters on that line (i.e., "greedily" consuming as much of the line as possible).

For your "helpMe" example, you'd want this regex:

<helpMe>(.+?)</helpMe>

Given this string:

<div>Something<helpMe>ABCDE</helpMe></div>

You'd get this match:

ABCDE

The value of the non-greedy quantifier is evident in this variation:

Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>

The greedy capture would look like this:

ABCDE</helpMe><helpMe>FGHIJ

There are some useful interactive tools to play with these variations:

Upvotes: 4

Brad Christie
Brad Christie

Reputation: 101614

Ken Redler has a great answer regarding your first question. For the second question try:

<(helpMe)>(.*?)</\1>

Using the back reference \1 you can find values between the set of matching tags. The first group finds the tag name, the second group matches the content itself, and the \1 back reference re-uses the first group's match (in this case the tag name).

Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\1> where now match.Groups["value"].Value contains your value.

Upvotes: 2

alexn
alexn

Reputation: 59012

What does (.+?) indicate?

It means match any character (.) one or more times (+?)

A simple regex to match your second string would be

<helpMe>([a-z0-9]+)<\/helpMe>

This will match any character of a-z and any digit inside <helpme> and </helpMe>.

The pharanteses are used to capture a group. This is useful if you need to reference the value inside this group later.

Upvotes: 0

Related Questions