JohnDotOwl
JohnDotOwl

Reputation: 3755

Extract Full Html Source , not partial

I am trying to extract images and some text off the following site http://bit.ly/16jFeyA

Web Form , C# , Visual Studio, HtmlAgilityPack

Encoding Works well with WebClient Only , browser wb.Document.Encoding = "GB2312"; doesn't work, Not important.

The site uses Lazy Load, for images. The WebBrowser Loads properly, with the images with info but when i extract using either web client / wb.DocumentText , it will not download the "full information" some information are missing especially the images links etc.

Is there anyway around this? I am trying to extract images and product info.

Extracted using wb.DocumentText after scrolling down to force image to load(due to lazy load) - http://notepad.cc/share/EjW3tFCffO

wb = webBrowser

Thanks in advance!

Upvotes: 0

Views: 261

Answers (1)

Matt Ball
Matt Ball

Reputation: 359786

You need to use something which knows how to evaluate and execute client-side JavaScript, such as a headless browser. PhantomJS should suffice.

Upvotes: 2

Related Questions