rootman
rootman

Reputation: 670

Using Regular Expression to count elements

I want to scrape a star based rating, that is the corresponding code

<div class="product_detail_info_rating_stars">
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star"></div>
</div>

Every rating has this codesnippet. I am looking for a way to convert these snippets into numbers like this one would be a 4 (4 of 5 stars).

The way that comes to my mind is to match the whole block for each rating and then match the full class and count it, but maybe there is a better way that I am not seeing.

Is there a better way to solve this problem?

Thanks!

Upvotes: 0

Views: 506

Answers (1)

dfsq
dfsq

Reputation: 193261

Here is a quick example of how you can use SimpleXML parser and XPath.

// Get your page HTML string
$html = file_get_contents('1page.htm');

// To suppress invalid markup warnings
libxml_use_internal_errors(true);

// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);

// Find a nodes
$blocks = $xml->xpath('//div[contains(@class, "product_detail_info_rating_stars")]');

foreach ($blocks as $block)
{
    $count = 0;
    foreach ($block->children() as $child) {
        if ($child['class'] == 'product_detail_star full') {
            $count++;
        }
    }
    echo '<pre>'; print_r('Rating: ' . $count . ' of ' . $block->count()); echo '</pre>';
}

// Clear invalid markup error buffer
libxml_clear_errors();

For test html page like this:

<!doctype html>
<html>
<head></head>
<body>

<table>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
</table>

</body>
</html>

It will output something like:

Rating: 1 of 5
Rating: 2 of 5
Rating: 4 of 5

Play with this to adjust to your needs.

Upvotes: 2

Related Questions