Reputation: 670
I want to scrape a star based rating, that is the corresponding code
<div class="product_detail_info_rating_stars">
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star"></div>
</div>
Every rating has this codesnippet. I am looking for a way to convert these snippets into numbers like this one would be a 4 (4 of 5 stars).
The way that comes to my mind is to match the whole block for each rating and then match the full class and count it, but maybe there is a better way that I am not seeing.
Is there a better way to solve this problem?
Thanks!
Upvotes: 0
Views: 506
Reputation: 193261
Here is a quick example of how you can use SimpleXML parser and XPath.
// Get your page HTML string
$html = file_get_contents('1page.htm');
// To suppress invalid markup warnings
libxml_use_internal_errors(true);
// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
// Find a nodes
$blocks = $xml->xpath('//div[contains(@class, "product_detail_info_rating_stars")]');
foreach ($blocks as $block)
{
$count = 0;
foreach ($block->children() as $child) {
if ($child['class'] == 'product_detail_star full') {
$count++;
}
}
echo '<pre>'; print_r('Rating: ' . $count . ' of ' . $block->count()); echo '</pre>';
}
// Clear invalid markup error buffer
libxml_clear_errors();
For test html page like this:
<!doctype html>
<html>
<head></head>
<body>
<table>
<tr>
<td>
<div class="product_detail_info_rating_stars">
<div class="product_detail_star full"></div>
<div class="product_detail_star"></div>
<div class="product_detail_star"></div>
<div class="product_detail_star"></div>
<div class="product_detail_star"></div>
</div>
</td>
</tr>
<tr>
<td>
<div class="product_detail_info_rating_stars">
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star"></div>
<div class="product_detail_star"></div>
<div class="product_detail_star"></div>
</div>
</td>
</tr>
<tr>
<td>
<div class="product_detail_info_rating_stars">
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star full"></div>
<div class="product_detail_star"></div>
</div>
</td>
</tr>
</table>
</body>
</html>
It will output something like:
Rating: 1 of 5
Rating: 2 of 5
Rating: 4 of 5
Play with this to adjust to your needs.
Upvotes: 2