Reputation: 1395
I want to strip the html tags and only return the text between the tags. Here is what I'm currently using.
string regularExpressionPattern1 = @"<td(.*?)<\/td>";
Regex regex = new Regex(regularExpressionPattern1, RegexOptions.Singleline);
MatchCollection collection = regex.Matches(value.ToString());
I currently get <td>13</td>
, and I just want 13
.
Thanks,
Upvotes: 8
Views: 20732
Reputation: 8993
You can use look-behind ?<=
and look-ahead ?=
like this:
(?<=<td>)(.*?)(?=<\/td>)
That should give you just the text between the tags. More info on Regex and look-ahead/look-behind can be found Here.
Also, a good Regex tester can be found Here. I use it to test all my Regex strings when I'm writing them.
Upvotes: 6
Reputation: 114461
So, using the HTML AgilityPack, this would be really easy...
HtmlDocument doc = doc.LoadHtml(value);
var nodes = doc.DocumentNode.SelectNodes("//td//text()");
Puts the TextNodes in the nodes variable.
Upvotes: 3
Reputation: 3194
You need to get value of group not of the match. Try this
Match m = collection[0];
var stripped = m.Groups[1].Value;
Upvotes: 7