StackOverflower
StackOverflower

Reputation: 5761

Parse page HTML output


I'd like to know one (or more) ways to parse the HTML page output. I'd like to detect some patterns on the HTML that will be send to the client and log some info if present.

Upvotes: 1

Views: 430

Answers (2)

Caspar Kleijne
Caspar Kleijne

Reputation: 21864

Everything you need is in the

   Page.Render 

method, override it and do what you want to in there.

protected override void Render(HtmlTextWriter writer)
{
    // do your stuff here
     StringBuilder  stringBuilder = new StringBuilder();
     StringWriter   stringWriter = new StringWriter(stringBuilder); 
     HtmlTextWriter htmlTextWriter = new HtmlTextWriter(stringWriter);

     base.Render(htmlTextWriter); // <-- render the page into the htmlTextwriter
     // the htmlTextwriter connects trough the stringWriter to the stringBuilder 
     string theHtml = stringBuilder.ToString(); // <---- html captured in string
     //---------------------------------------------
     //do stuff on theHtml here
     //---------------------------------------------
     writer.Write(theHtml); // <----write html with the original writer
}

Upvotes: 2

Rex M
Rex M

Reputation: 144122

It depends on what you mean by "parse" exactly, but something like the HTML Agility Pack can create an XML-like structure from an HTML document - essentially creating a proper HTML DOM data structure. You can even then convert it straight to XML, use LINQ, etc.

Upvotes: 1

Related Questions