Henrik Petterson
Henrik Petterson

Reputation: 7094

Targeting URLs with parameters

I want to grab the URL with highest pg value:

$html ='
    <a href="http://example.com/?pg=1"></a>
    <a href="http://example.com/?pg=2"></a>
    <a href="http://example.com/?pg=3"></a>
';

I use this regex to locate the appropriate links:

preg_match_all('/<a.*href="\.\/\?pg=(\d+)".*>(?:.*)<\/a>/U', $html, $preg_matches);

Sometimes, the links include another parameter:

http://example.com/?pg=3&test=1

My question is, how do I adjust my regex so links with the added parameters are included as well?

Upvotes: 1

Views: 45

Answers (2)

TsV
TsV

Reputation: 639

        $html ='
        <a href="http://example.com/?pg=1"></a>
        <a href="http://example.com/?pg=2"></a>
        <a href="http://example.com/?pg=4&test=1"></a>
    ';
        preg_match_all('/<a[^>]+href=\"(.*?)\"[^>]*>(.*)?<\/a>/', $html, $out);

        $result = null;
        foreach ($out[1] as $link){
            parse_str(parse_url($link, PHP_URL_QUERY), $atr);
            $result[$link] = $atr['pg'];
        }

        print_r($result);

//        "http://example.com/?pg=1" => "1"
//        "http://example.com/?pg=2" => "2"
//        "http://example.com/?pg=4&test=1" => "4"

Upvotes: 0

Moak
Moak

Reputation: 12865

  1. Use a DOM parser to find the anchors.
  2. Use parse_url to parse the urls and get the query value
  3. use parse_str to get the query values

Example:

$dom = new DOMDocument;
$dom->loadHTML($html);

$html ='
    <a href="http://example.com/?pg=1"></a>
    <a href="http://example.com/?pg=2"></a>
    <a href="http://example.com/?pg=3"></a>
';
$anchors = $dom->getElementsByTagName('a');

foreach ($anchors as $anchor) {
        $url = $anchor->getAttribute('href');
        $query = parse_url($url, PHP_URL_QUERY);
        parse_str($query, $output);
        $pg = $output['pg'];
        //do something
}

Here's a helpful tutorial for PHP. http://htmlparsing.com/php.html

Also see here, why you should not use Regex for parsing html https://stackoverflow.com/a/1732454/81785

Upvotes: 1

Related Questions