AAP
AAP

Reputation: 169

htmlagilityPack: Web page doesn't return complete html

Using htmlagilityPack trying to get all href links. But web page doesn't return all links.

I tried in browser and saw that until you scroll down the whole page it doesn't show all links. Then I tried to resize (zoom-in) browser window so that all page contents can be seen without scrolling down. That moment all links appeared. May be java need to triggered....

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument Doc = web.Load("https://www.verkkokauppa.com/fi/catalog/438b/Televisiot/products?page=1");

foreach (HtmlNode item in Doc.DocumentNode.SelectNodes("//li[@class='product-list-grid__grid-item']/a"))

{                                                                               
 debug.WriteLine(item.GetAttributeValue("href", string.Empty));                      
}

One page has 24 product links but I get only 15 out of them.

Upvotes: 0

Views: 1012

Answers (1)

Maxim Tkachenko
Maxim Tkachenko

Reputation: 5808

Check Network tab in chrome on that page. There are ajax requests to https://www.verkkokauppa.com/resp-api/product?pids=467610. So products are loaded using javascript.

You can't just trigger javascript here. HtmlAgilityPack is an html parser. If you want to work with dynamic content you need browser engine. I think you should check Selenium and phantomjs.

Upvotes: 1

Related Questions