Reputation: 7163
I'm looking to know how I can strip any hyperlink < a > tags from within some text - the whole lot including the text/image whatever is being linked before the end < / a > tag.
E.g.
<a href="http://stackoverflow.com">Click here</a>
<a href="http://stackoverflow.com"><img src="http://stackoverflow.com" alt = "blah"></a>
ie. remove the whole lot.
Any ideas how to do this?
Thanks
Upvotes: 3
Views: 29
Reputation: 6294
You can try a regular expression to replace your tags. My regex isn't the best but this should get you close.
System.Text.RegularExpressions.Regex.Replace(
input,
@"<a[^>]*?>.*?</a>",
string.Empty);
Upvotes: 0
Reputation: 85096
Obligatory "don't use regex to parse html" warning: RegEx match open tags except XHTML self-contained tags
I would recommend either converting to XHTML and using xPath or taking a look at the HTMLAgilityPack to do this. I have used both methods for parsing/modifying html in the past and they are far more flexible/robust than using regex.
Here is an example that should get you started with HtmlAgilityPack:
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]")
{
// Do stuff!
}
doc.Save("file.htm");
Upvotes: 1
Reputation: 69372
From what I understand, this should work
string linksRemoved = Regex.Replace(withLinks, @"</?(a|A).*>", "");
Upvotes: 0