Reputation: 1395
I'm trying to find the correct XPath to use with the HTML Agility Pack. I've tried multiple XPaths but all give me the - $exception {"Object reference not set to an instance of an object."} System.NullReferenceException
Here are some of the XPaths I've tried. All give me an error
//*[@id="div_team-stats-per_game"]
// html / body / div[2] / div[5] / div[4] / div[3] / div
//*[@id="team-stats-per_game"]/tbody
Here is my code:
HtmlWeb web1 = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc1 = new HtmlAgilityPack.HtmlDocument();
doc1 = web1.Load("https://www.basketball-reference.com/leagues/NBA_2021.html");
var _extractText = doc1.DocumentNode.SelectSingleNode("//*[@id=\"team-stats-per_game\"]").InnerText;
Console.WriteLine(_extractText);
Upvotes: 0
Views: 79
Reputation: 36
Do you have the ability to use the LoadFromBrowser method? The method you are using currently will get the raw HTML of the document but not the dynamically loaded content.
The table you are trying to parse is loaded into the page via javascript so you would need to use the LoadFromBrowser method and wait for the element to be shown:
HtmlWeb web1 = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc1 = new HtmlAgilityPack.HtmlDocument();
doc1 = web1.LoadFromBrowser("https://www.basketball-reference.com/leagues/NBA_2021.html", html => {
// Wait for the HTML element to exist
return !html.Contains("<table id=\"team-stats-per_game\">");
});
var _extractText = doc1.DocumentNode.SelectSingleNode("//*[@id=\"team-stats-per_game\"]").InnerText;
Console.WriteLine(_extractText);
Upvotes: 2