DanP
DanP

Reputation: 6478

Html Agility Pack - Get html fragment from an html document

Using the html agility pack; how would I extract an html "fragment" from a full html document? For my purposes, an html "fragment" is defined as all content inside of the <body> tags.

For example:

Sample Input:

<html>
   <head>
     <title>blah</title>
   </head>
   <body>
    <p>My content</p>
   </body>
</html>

Desired Output:

<p>My content</p>

Ideally, I'd like to return the content unaltered if it didn't contain an <html> or <body> element (eg. assume that I was passed a fragment in the first place if it wasn't a full html document)

Can anyone point me in the right direction?

Upvotes: 3

Views: 6246

Answers (2)

Manish Pansiniya
Manish Pansiniya

Reputation: 545

I think you need to do it in pieces.

you can do selectnodes of document for body or html as follows

doc.DocumentNode.SelectSingleNode("//body") // returns body with entire contents :)

then you can check for null values for criteria and if that is provided, you can take the string as it is.

Hope it helps :)

Upvotes: 6

Oscar Mederos
Oscar Mederos

Reputation: 29803

The following will work:

public string GetFragment(HtmlDocument document)
{
   return doc.DocumentNode.SelectSingleNode("//body") == null ? doc.DocumentNode.InnerHtml : doc.DocumentNode.SelectSingleNode("//body").InnerHtml;
}

Upvotes: 6

Related Questions