thegunner
thegunner

Reputation: 7163

strip any hyperlinks and text within from piece of text

I'm looking to know how I can strip any hyperlink < a > tags from within some text - the whole lot including the text/image whatever is being linked before the end < / a > tag.

E.g.

<a href="http://stackoverflow.com">Click here</a>        
<a href="http://stackoverflow.com"><img src="http://stackoverflow.com" alt = "blah"></a>

ie. remove the whole lot.

Any ideas how to do this?

Thanks

Upvotes: 3

Views: 29

Answers (3)

Jay
Jay

Reputation: 6294

You can try a regular expression to replace your tags. My regex isn't the best but this should get you close.

System.Text.RegularExpressions.Regex.Replace(
     input, 
     @"<a[^>]*?>.*?</a>", 
     string.Empty);

Upvotes: 0

Abe Miessler
Abe Miessler

Reputation: 85096

Obligatory "don't use regex to parse html" warning: RegEx match open tags except XHTML self-contained tags

I would recommend either converting to XHTML and using xPath or taking a look at the HTMLAgilityPack to do this. I have used both methods for parsing/modifying html in the past and they are far more flexible/robust than using regex.

Here is an example that should get you started with HtmlAgilityPack:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]")
 {
    // Do stuff!
 }
 doc.Save("file.htm");

Upvotes: 1

keyboardP
keyboardP

Reputation: 69372

From what I understand, this should work

string linksRemoved = Regex.Replace(withLinks, @"</?(a|A).*>", "");

Upvotes: 0

Related Questions