Trey Balut
Trey Balut

Reputation: 1395

Regex Find Text Between Tags C#

I want to strip the html tags and only return the text between the tags. Here is what I'm currently using.

string regularExpressionPattern1 = @"<td(.*?)<\/td>";
Regex regex = new Regex(regularExpressionPattern1, RegexOptions.Singleline);
MatchCollection collection = regex.Matches(value.ToString());

I currently get <td>13</td>, and I just want 13.

Thanks,

Upvotes: 8

Views: 20732

Answers (4)

Mike Webb
Mike Webb

Reputation: 8993

You can use look-behind ?<= and look-ahead ?= like this:

(?<=<td>)(.*?)(?=<\/td>)

That should give you just the text between the tags. More info on Regex and look-ahead/look-behind can be found Here.

Also, a good Regex tester can be found Here. I use it to test all my Regex strings when I'm writing them.

Upvotes: 6

jessehouwing
jessehouwing

Reputation: 114461

So, using the HTML AgilityPack, this would be really easy...

 HtmlDocument  doc = doc.LoadHtml(value);
 var nodes = doc.DocumentNode.SelectNodes("//td//text()");

Puts the TextNodes in the nodes variable.

Upvotes: 3

Rakhitha
Rakhitha

Reputation: 328

use match.Groups[1].Value

Upvotes: -1

Yevgeniy.Chernobrivets
Yevgeniy.Chernobrivets

Reputation: 3194

You need to get value of group not of the match. Try this

Match m = collection[0];
var stripped = m.Groups[1].Value;

Upvotes: 7

Related Questions