ryudice
ryudice

Reputation: 37366

Trim part of html text c# without trimming a html tag

I need to generate an excerpt for a piece of html text, I cant use just Substring method because I could be trimming a tag, is there a function that takes tags in to consideration so it skips until the tag ends?

Upvotes: 0

Views: 1049

Answers (3)

Fulvio
Fulvio

Reputation: 1647

An example would help as stated by ckittel..If I did get your question right there is no such built-in functionality.

Depending on your needs, and the kind of HTML you are processing, you could do with a simple Regular Expression based method that strips the html tags from your text and decodes html entities:

public static string StripHTML(string HTMLText)
{
    string ret = HTMLText.Replace("<br>", "\n").Replace("<br />", "\n");
    Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
    return HttpUtility.HtmlDecode(reg.Replace(ret, ""));
}

If you test this code with something like the following code..

string longHtmlText = "<html>This is a &quot;<b>long &amp; bolded</b> <a href=\"http://en.wikipedia.org/wiki/HTML\">HTML</a> text</html>&quot;";
string excerpt = StripHTML(longHtmlText);
excerpt = excerpt.Substring(0, 30) + "(..)";

..the result would be..

This is a "long & bolded HTML (..)

..which should answer your question.

Just remember, as Albireo noticed, Regex is nothing like HTML parsing...but if you need quick HTML stripping and trimming (for simple HTML texts) without external components this code may be enough for you.

Upvotes: 0

Albireo
Albireo

Reputation: 11085

There is no "function" to do what you want to do, you must use an HTML parser (e.g. the one suggested by Russ C) and iterate all the nodes.

And, please please please do not try with regular expressions (I'm just being proactive here).

Upvotes: 1

Russ Clarke
Russ Clarke

Reputation: 17909

I think the HTML Agility Pack will provide the functionality you require:

How to use HTML Agility pack

and:

Getting the text from a node using HtmlAgilityPack

Upvotes: 1

Related Questions