Reputation: 1439
I need to strip link tags from a body of text but keep the anchor text. for example:
<a href ="">AnchorText</a>
needs to become just:
AnchorText
I was considering using the following RegEx:
<(.{0}|/)(a|A).*?>
Is a RegEx the best way to go about this? If so, is the above RegEx pattern adequate? If RegEx isn't the way to go, what's a better solution? This needs to be done server side.
Upvotes: 3
Views: 3090
Reputation: 41
I have been trying to do the same and found the following solution:
Upvotes: 1
Reputation: 160852
You could just use HtmlAgilityPack:
string sampleHtml = "<a href =\"\">AnchorText</a>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sampleHtml);
string text = doc.DocumentNode.InnerText; //output: AnchorText
Upvotes: 3
Reputation: 92976
Your regex will do the job. You can write it a bit simpler as
</?(a|A).*?>
/?
means 0 or 1 /
But its equivalent to your (.{0}|/)
Upvotes: 5
Reputation: 41236
Use jQuery replaceWith:
$('a').replaceWith(function()
{
return $('<span/>').text($(this).text());
});
Assuming you are doing this on the client side.
Upvotes: 1
Reputation: 2542
I think a regex is the best way to accomplish this, and your pattern looks like it should work.
Upvotes: 1