Zereges
Zereges

Reputation: 5209

html - Hidden content in div tag


I would like to download content of certain page and get one number from it (still not sure how, probably using PHP DOM interface). I opened the page, started Firefox's debugging, picked the element with number and found out that is in <div id="lblOptimizePercent" class="wod-dpsval">98.4%</div> (98.4% is what I am looking for). So I opened its source code, Ctrl - F for lblOptimizePercent and all I found is this <div id="lblOptimizePercent" class="wod-dpsval"></div> without any content. What I've done wrong? Or is it some site's protection not to steal contents?

Link to the original site

Upvotes: 0

Views: 860

Answers (2)

Jens A. Koch
Jens A. Koch

Reputation: 41737

Normally, to scrape the page from PHP, you would have to

  1. save the page
  2. extract the value you want from HTML via a regular expression
    • alternatives include using SimpleXML for DOM querying...

The piece of HTML we are look at is:

<div id="lblOptimizePercent" class="wod-dpsval">DATA</div>

<?php     
$text = file_get_contents('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic');
$regexp = '^<div id=\"lblOptimizePercent\" class=\"wod-dpsval\">(.*)<\/div>^';
preg_match($regexp, $text, $matches);
$percentage = $matches[1];
echo $percentage;

This should give you DATA - the percentage value. But this doesn't happen! Why:


The data is dynamically inserted by a Javascript on client-side. The id or class selector is used for DOM querying (element selection), then the data value is added.

http://api.jquery.com/id-selector/ - http://api.jquery.com/class-selector/

jQuery example

On this site they deliver <div id="lblOptimizePercent" class="wod-dpsval"></div>to the client and then they use an update query like this: $("#lblOptimizePercent").text("100%"); to update the percentage value.

If you want to query it on client-side, you might use $("#lblOptimizePercent").text();**

Try this in your console. It returns the percentage value.


How to scrape this page?

If you want to scrape this page with dynamic data, you need something like a Browser Environment for scraping: PhantomJS or SlimerJS are your friend. Open the page with PhantomJS, launch the jQuery cmd from above and done.

This snippet should get you pretty close. You might save it as scrape.js then execute it with Phantom.

var page = require('webpage').create();
page.open('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic', function() {
  page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
    page.evaluate(function() {
      alert(
         $("#lblOptimizePercent").text()
      );
    });
    phantom.exit()
  });
});

You can also save the "evaluated page" (now with data) and do the extract with PHP. That's exactly like: Save Page in your browser and working on the saved HTML file.

Upvotes: 2

pavel
pavel

Reputation: 27082

In Firebug or another webdeveloper tools you see the generated content, in Source code there is a blank element only.

First time, blank element is shown (during rendering site) and than using JS the content is filled.

Googlebot etc. can´t see this JS-generated content, but it´s no problem in this case.

Code:

document.getElementById('lblOptimizePercent').innerHTML = '94%'; 

Or similarly using jQuery:

$('#lblOptimizePercent').html('94%');
// need to load jQuery before, of course

Upvotes: 2

Related Questions