Rasmus Christensen
Rasmus Christensen

Reputation: 8531

Read <body> tag of HTML file using C#

I need to get all the content inside the body tag of an HTML file using C#. Are there any good and effective ways of doing this?

Upvotes: 2

Views: 15204

Answers (7)

Maxim Zabolotskikh
Maxim Zabolotskikh

Reputation: 3367

To save you the math in the accepted answer:

var start = html.IndexOf("<body>") + "<body>".Length;
var end = html.IndexOf("</body>");
var result = html.Substring(start, end - start);

Mind that it's not 100% bulletproof:

  • It will fail on CDATA blocks containing <body>
  • It will fail if you have something like <body lang="en">

So all in all you are probably better off with the Agility Pack, unless you know for sure, which HTML you are working with.

Upvotes: 0

Maghalakshmi Saravana
Maghalakshmi Saravana

Reputation: 813

Reading the Html Structure into Html String and Getting the Body Tag content using C# without HtmlAgility packages

       private void Button_Click(object sender, RoutedEventArgs e)
        {
            string filepath = @"C:\Users\Testing\Documents\sample1.txt";
            string htmlString = File.ReadAllText(filepath);
            string htmlTagPattern = "<.*?>";
            Regex oRegex = new Regex(".*?<body.*?>(.*?)</body>.*?", RegexOptions.Multiline);
            htmlString = oRegex.Replace(htmlString, string.Empty);
            htmlString = Regex.Replace(htmlString, htmlTagPattern, string.Empty);
            htmlString = Regex.Replace(htmlString, @"^\s+$[\r\n]*", "", RegexOptions.Multiline);
            htmlString = htmlString.Replace("&nbsp;", string.Empty);
        }

Upvotes: 1

marc_s
marc_s

Reputation: 754468

Check out the HTML Agility Pack to do all sorts of HTML manipulation

It gives you an interface somewhat similar to the XmlDocument XML handling interface:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");

 HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("/html/body");

 if(bodyNode != null)
 {
    // do something
 }

Upvotes: 9

Tomas Voracek
Tomas Voracek

Reputation: 5914

Use XML methods, XPath. For more advanced manipulation with html use HTML Agility pack.

Upvotes: 0

Bryan
Bryan

Reputation: 2791

If it happens to be XHTML, then you could use XPath.

Upvotes: 0

Dutchie432
Dutchie432

Reputation: 29160

Its easy enough to pull the page code into a string, and simply search for the occurrence of the string "<body" and the string "</body", and just do a little math to get your value...

Upvotes: 2

Darin Dimitrov
Darin Dimitrov

Reputation: 1038780

You may take a look at SgmlReader and HTML Agility Pack.

Upvotes: 3

Related Questions