html - Hidden content in div tag

Question

I would like to download content of certain page and get one number from it (still not sure how, probably using PHP DOM interface). I opened the page, started Firefox's debugging, picked the element with number and found out that is in

98.4%

(98.4% is what I am looking for). So I opened its source code, Ctrl - F for lblOptimizePercent and all I found is this without any content. What I've done wrong? Or is it some site's protection not to steal contents?

Link to the original site

Jens A. Koch · Accepted Answer

Normally, to scrape the page from PHP, you would have to

save the page
extract the value you want from HTML via a regular expression
- alternatives include using SimpleXML for DOM querying...

The piece of HTML we are look at is:

DATA

(.*)<\/div>^';
preg_match($regexp, $text, $matches);
$percentage = $matches[1];
echo $percentage;

This should give you DATA - the percentage value. But this doesn't happen! Why:

The data is dynamically inserted by a Javascript on client-side. The id or class selector is used for DOM querying (element selection), then the data value is added.

http://api.jquery.com/id-selector/ - http://api.jquery.com/class-selector/

jQuery example

On this site they deliver

to the client and then they use an update query like this: $("#lblOptimizePercent").text("100%"); to update the percentage value.

If you want to query it on client-side, you might use $("#lblOptimizePercent").text();**

Try this in your console. It returns the percentage value.

How to scrape this page?

If you want to scrape this page with dynamic data, you need something like a Browser Environment for scraping: PhantomJS or SlimerJS are your friend. Open the page with PhantomJS, launch the jQuery cmd from above and done.

This snippet should get you pretty close. You might save it as scrape.js then execute it with Phantom.

var page = require('webpage').create();
page.open('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic', function() {
  page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
    page.evaluate(function() {
      alert(
         $("#lblOptimizePercent").text()
      );
    });
    phantom.exit()
  });
});

You can also save the "evaluated page" (now with data) and do the extract with PHP. That's exactly like: Save Page in your browser and working on the saved HTML file.

html - Hidden content in div tag

Answers (2)

Related Questions