KobCoder
KobCoder

Reputation: 66

html-agility-pack Get HTML writen 'Loading...'

I am trying to Get HTML from a site with html-agility-pack

private static void GetHtml()
{
    var html = ".....";

    HtmlWeb web = new HtmlWeb();

    var htmlDoc = web.Load(html);

    var node = htmlDoc.DocumentNode.SelectSingleNode("//body");

    string h = node.OuterHtml;
    Console.WriteLine(h);
}

but where must be written data there is written 'Loading....'

how can I solve this problem?

[issue]

Upvotes: 2

Views: 532

Answers (2)

Daniel Manta
Daniel Manta

Reputation: 6683

You are getting a "Loading" message because this is what the original Html source of page contains. After the document is loaded in your browser, new content is generated by scripts running on the page. But HtmlAgilityPack can't see that. HtmlAgilityPack was created as a library for parsing Html.

Update: Latest versions of HtmlAgilityPack are now able to run a WebBrowser (System.Windows.Forms) in background and execute Javascript code on the page by calling LoadFromBrowser() method. The newly dynamically generated Html can then be scraped from resulting page. See http://html-agility-pack.net/from-browser.

Upvotes: 2

KobCoder
KobCoder

Reputation: 66

thank you for answer. you are true. this problem is because javascript not run.

I have already solved this problem using geckoFX

 geckoWebBrowser1.Navigate("google.com");

        GeckoHtmlElement element = null;
        var geckoDomElement = geckoWebBrowser1.Document.DocumentElement;
        if (geckoDomElement is GeckoHtmlElement)
        {
            element = (GeckoHtmlElement)geckoDomElement;
            var innerHtml = element.InnerHtml;

            using (FileStream fs = new FileStream(@"" + "aaa" + ".html", FileMode.Create))
            {
                using (StreamWriter w = new StreamWriter(fs, Encoding.UTF8))
                {
                    w.WriteLine(innerHtml);
                }

            }
        }

Upvotes: 0

Related Questions