Adam Lewison
Adam Lewison

Reputation: 15

Regular Expression not matching content in PHP

I am trying to scrape an ebay page such as this one: http://www.ebay.co.uk/sch/Cars-/9801/i.html?_nkw=vw+golf

Everything works great except one of my regular expressions just isn't matching the content and therefore the matches aren't being pushed to $linksArray I have outputted the contents to make sure what I am trying to match is infact there - and it is. I then go print_r($linksArray) where all the matches should be. but it's not. It is an empty multi dimensional array. You can see my live example here: http://www.mycommunity.co.za/marcksack/index.php

Here is my PHP code:

<?php
echo '<form method="POST">
<input type="text" id="url" name="url" size="120" value="' . (isset($_REQUEST["url"]) && !empty($_REQUEST["url"]) ? $_REQUEST["url"] : "") . '"/>
<input type="submit" value="Submit" />
</form>';
flush();

if (isset($_REQUEST["url"]) && !empty($_REQUEST["url"])) {
    $url = $_REQUEST["url"];
    $phones = array();
    for ($page = 1; $page <= 1; $page++) {

        // get page contents

        $contents = file_get_contents($url . "&_pgn=" . $page);
        echo(htmlentities($contents));
        // find all links patterns
        // HERE IS THE PROBLEM
        $pattern = '/class="lvtitle"><a href="(.*)" class="vip"/';
        $linksArray = array();
        preg_match_all($pattern, $contents, $linksArray);
        print_r($linksArray);
        $links = $linksArray[0];

        foreach($links as $link) {
            $pureLink = str_replace("class=\"lvtitle\"><a href=\"", "", $link);
            $pureLink = str_replace("\" class=\"vip\"", "", $pureLink);

            // getting sub page contents

            $subContents = file_get_contents($pureLink);

            // find all links patterns

            $subContents = str_replace(" ", "", $subContents);
            $phonePattern = '/07[0-9]{9}/';
            $phonesArray = array();
            preg_match_all($phonePattern, $subContents, $phonesArray);
            foreach($phonesArray[0] as $element) {

                // check if phone not added previousely to the phones array

                if (!in_array($element, $phones)) {

                    // add it to the phones array

                    array_push($phones, $element);
                    echo $element . "<br />";
                    flush();
                }
            }
        }
    }

    // print results
    foreach($phones as $phone){
        echo $phone."<br/>";
    }

}

?>

So obviously my question is what am I doing wrong? Why are the matches not being pushed to my $linksArray variable. I really appreciate your help!

Upvotes: 1

Views: 77

Answers (1)

Chip Dean
Chip Dean

Reputation: 4302

This regex works:

"/ class=\"lvtitle\"><a href=\"([^\"]*)\"  class=\"vip\"/"

A few issues with your's:

  1. You were trying to capture the URL using (.*), which will match the entire line.
  2. It was not matching the entire line because ebay has two spaces in between the class and href attributes.

Also, as has already been mentioned, you should use the API or DOMDocument for this. But in case you are curious, this is why it wasn't working. I hope that helps!

Upvotes: 1

Related Questions