Reputation: 5209
I would like to download content of certain page and get one number from it (still not sure how, probably using PHP DOM interface). I opened the page, started Firefox's debugging, picked the element with number and found out that is in <div id="lblOptimizePercent" class="wod-dpsval">98.4%</div>
(98.4% is what I am looking for). So I opened its source code, Ctrl - F for lblOptimizePercent
and all I found is this <div id="lblOptimizePercent" class="wod-dpsval"></div>
without any content. What I've done wrong? Or is it some site's protection not to steal contents?
Upvotes: 0
Views: 860
Reputation: 41737
Normally, to scrape the page from PHP, you would have to
The piece of HTML we are look at is:
<div id="lblOptimizePercent" class="wod-dpsval">DATA</div>
<?php
$text = file_get_contents('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic');
$regexp = '^<div id=\"lblOptimizePercent\" class=\"wod-dpsval\">(.*)<\/div>^';
preg_match($regexp, $text, $matches);
$percentage = $matches[1];
echo $percentage;
This should give you DATA - the percentage value. But this doesn't happen! Why:
The data is dynamically inserted by a Javascript on client-side. The id or class selector is used for DOM querying (element selection), then the data value is added.
http://api.jquery.com/id-selector/ - http://api.jquery.com/class-selector/
jQuery example
On this site they deliver <div id="lblOptimizePercent" class="wod-dpsval"></div>
to the client and then they use an update query like this: $("#lblOptimizePercent").text("100%");
to update the percentage value.
If you want to query it on client-side, you might use $("#lblOptimizePercent").text();
**
Try this in your console. It returns the percentage value.
How to scrape this page?
If you want to scrape this page with dynamic data, you need something like a Browser Environment for scraping: PhantomJS or SlimerJS are your friend. Open the page with PhantomJS, launch the jQuery cmd from above and done.
This snippet should get you pretty close. You might save it as scrape.js then execute it with Phantom.
var page = require('webpage').create();
page.open('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic', function() {
page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
page.evaluate(function() {
alert(
$("#lblOptimizePercent").text()
);
});
phantom.exit()
});
});
You can also save the "evaluated page" (now with data) and do the extract with PHP. That's exactly like: Save Page in your browser and working on the saved HTML file.
Upvotes: 2
Reputation: 27082
In Firebug or another webdeveloper tools you see the generated content, in Source code there is a blank element only.
First time, blank element is shown (during rendering site) and than using JS the content is filled.
Googlebot etc. can´t see this JS-generated content, but it´s no problem in this case.
Code:
document.getElementById('lblOptimizePercent').innerHTML = '94%';
Or similarly using jQuery:
$('#lblOptimizePercent').html('94%');
// need to load jQuery before, of course
Upvotes: 2