Stephen
Stephen

Reputation: 537

How do i grab everything inside the BODY html tag (From a string) using RegEx Asp.net C#

{Yup, the above more or less explains it} :)

Regex oRegex = new Regex("<body.*?>(.*?)</body>", RegexOptions.Multiline);

The above doesnt seem to work if the body has any attributes in it.

Upvotes: 2

Views: 2735

Answers (3)

Stephen
Stephen

Reputation: 537

I solved it eventually by using RegexOptions.Singleline instead of using RegexOptions.Multiline

Upvotes: 1

Marc Gravell
Marc Gravell

Reputation: 1062770

With the HTML Agility Pack (assuming it is html, not xhtml):

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string body = doc.DocumentNode.SelectSingleNode("/html/body").InnerHtml;

Upvotes: 10

Welbog
Welbog

Reputation: 60418

Don't use a regular expression. Use something that's meant to parse XML/HTML:

XmlDocument.SelectSingleNode("//body").InnerXml;

Load your string into an XmlDocument, use the SelectSingleNode function (which takes an XPath expression as a parameter), then extract what you need from the resulting XmlNode.

Upvotes: 4

Related Questions