John Smith
John Smith

Reputation: 4516

is there a straightforward way to retrieve text that is rendered by the browser but is not hard-coded in the actual html file?

I'm trying to retrieve data from a webpage but I cannot do it by making a web request and parsing the resulting html file because the actual text that I'm trying to retrieve is not in the html file! I imagine that this text is pulled using some script and for that reason it's not in the html file. For all I know I'm looking at the wrong data, but assuming that my theory is correct, is there a straightforward way to retrieve whatever text is displayed by the browser (Firefox or IE) rather than attempt to fetch the text from the html file?

Upvotes: 2

Views: 116

Answers (2)

GCD
GCD

Reputation: 394

Your other option would be to open the web page in a WebBrowser object which should execute the scripts, and then you can get the HtmlDocument object and go from there.

Take a look at this example...

    private void test()
    {
        WebBrowser wBrowser1 = new WebBrowser();
        wBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wBrowser1_DocumentCompleted);
        wBrowser1.Url = new Uri("Web Page URL");
    }

    void wBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        HtmlDocument document = (sender as WebBrowser).Document;
        // get elements and values accordingly. 
    }

Upvotes: 0

cowls
cowls

Reputation: 24334

Assuming you are referring to text that has been generated using Javascript in the browser.

You can use PhantomJS to achieve this: http://phantomjs.org/

It is essentially a headless browser that will process Javascript.

You may need to run this as ane xternal program but Im sure you can do that through C#

Upvotes: 1

Related Questions