Reputation: 37366
I need to generate an excerpt for a piece of html text, I cant use just Substring method because I could be trimming a tag, is there a function that takes tags in to consideration so it skips until the tag ends?
Upvotes: 0
Views: 1049
Reputation: 1647
An example would help as stated by ckittel..If I did get your question right there is no such built-in functionality.
Depending on your needs, and the kind of HTML you are processing, you could do with a simple Regular Expression based method that strips the html tags from your text and decodes html entities:
public static string StripHTML(string HTMLText)
{
string ret = HTMLText.Replace("<br>", "\n").Replace("<br />", "\n");
Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
return HttpUtility.HtmlDecode(reg.Replace(ret, ""));
}
If you test this code with something like the following code..
string longHtmlText = "<html>This is a "<b>long & bolded</b> <a href=\"http://en.wikipedia.org/wiki/HTML\">HTML</a> text</html>"";
string excerpt = StripHTML(longHtmlText);
excerpt = excerpt.Substring(0, 30) + "(..)";
..the result would be..
This is a "long & bolded HTML (..)
..which should answer your question.
Just remember, as Albireo noticed, Regex is nothing like HTML parsing...but if you need quick HTML stripping and trimming (for simple HTML texts) without external components this code may be enough for you.
Upvotes: 0
Reputation: 11085
There is no "function" to do what you want to do, you must use an HTML parser (e.g. the one suggested by Russ C) and iterate all the nodes.
And, please please please do not try with regular expressions (I'm just being proactive here).
Upvotes: 1
Reputation: 17909
I think the HTML Agility Pack will provide the functionality you require:
and:
Getting the text from a node using HtmlAgilityPack
Upvotes: 1