Reputation: 37366

Trim part of html text c# without trimming a html tag

I need to generate an excerpt for a piece of html text, I cant use just Substring method because I could be trimming a tag, is there a function that takes tags in to consideration so it skips until the tag ends?

Upvotes: 0

Answers (3)

Fulvio

Reputation: 1647

An example would help as stated by ckittel..If I did get your question right there is no such built-in functionality.

Depending on your needs, and the kind of HTML you are processing, you could do with a simple Regular Expression based method that strips the html tags from your text and decodes html entities:

public static string StripHTML(string HTMLText)
{
    string ret = HTMLText.Replace("<br>", "\n").Replace("<br />", "\n");
    Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
    return HttpUtility.HtmlDecode(reg.Replace(ret, ""));
}

If you test this code with something like the following code..

string longHtmlText = "<html>This is a &quot;<b>long &amp; bolded</b> <a href=\"http://en.wikipedia.org/wiki/HTML\">HTML</a> text</html>&quot;";
string excerpt = StripHTML(longHtmlText);
excerpt = excerpt.Substring(0, 30) + "(..)";

..the result would be..

This is a "long & bolded HTML (..)

..which should answer your question.

Just remember, as Albireo noticed, Regex is nothing like HTML parsing...but if you need quick HTML stripping and trimming (for simple HTML texts) without external components this code may be enough for you.

Upvotes: 0