Trey Balut
Trey Balut

Reputation: 1395

How Do I find the Correct XPATH?

I'm trying to find the correct XPath to use with the HTML Agility Pack. I've tried multiple XPaths but all give me the - $exception {"Object reference not set to an instance of an object."} System.NullReferenceException

Here are some of the XPaths I've tried. All give me an error

            //*[@id="div_team-stats-per_game"]
        // html / body / div[2] / div[5] / div[4] / div[3] / div
        //*[@id="team-stats-per_game"]/tbody

Here is my code:

            HtmlWeb web1 = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc1 = new HtmlAgilityPack.HtmlDocument();
        doc1 = web1.Load("https://www.basketball-reference.com/leagues/NBA_2021.html");
        var _extractText = doc1.DocumentNode.SelectSingleNode("//*[@id=\"team-stats-per_game\"]").InnerText;
        Console.WriteLine(_extractText);

Upvotes: 0

Views: 79

Answers (1)

Ben S
Ben S

Reputation: 36

Do you have the ability to use the LoadFromBrowser method? The method you are using currently will get the raw HTML of the document but not the dynamically loaded content.

The table you are trying to parse is loaded into the page via javascript so you would need to use the LoadFromBrowser method and wait for the element to be shown:

        HtmlWeb web1 = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc1 = new HtmlAgilityPack.HtmlDocument();

        doc1 = web1.LoadFromBrowser("https://www.basketball-reference.com/leagues/NBA_2021.html", html => {
            // Wait for the HTML element to exist
            return !html.Contains("<table id=\"team-stats-per_game\">");
        });

        var _extractText = doc1.DocumentNode.SelectSingleNode("//*[@id=\"team-stats-per_game\"]").InnerText;
        Console.WriteLine(_extractText);

Upvotes: 2

Related Questions