Reputation: 7094
I want to grab the URL with highest pg
value:
$html ='
<a href="http://example.com/?pg=1"></a>
<a href="http://example.com/?pg=2"></a>
<a href="http://example.com/?pg=3"></a>
';
I use this regex to locate the appropriate links:
preg_match_all('/<a.*href="\.\/\?pg=(\d+)".*>(?:.*)<\/a>/U', $html, $preg_matches);
Sometimes, the links include another parameter:
http://example.com/?pg=3&test=1
My question is, how do I adjust my regex so links with the added parameters are included as well?
Upvotes: 1
Views: 45
Reputation: 639
$html ='
<a href="http://example.com/?pg=1"></a>
<a href="http://example.com/?pg=2"></a>
<a href="http://example.com/?pg=4&test=1"></a>
';
preg_match_all('/<a[^>]+href=\"(.*?)\"[^>]*>(.*)?<\/a>/', $html, $out);
$result = null;
foreach ($out[1] as $link){
parse_str(parse_url($link, PHP_URL_QUERY), $atr);
$result[$link] = $atr['pg'];
}
print_r($result);
// "http://example.com/?pg=1" => "1"
// "http://example.com/?pg=2" => "2"
// "http://example.com/?pg=4&test=1" => "4"
Upvotes: 0
Reputation: 12865
Example:
$dom = new DOMDocument;
$dom->loadHTML($html);
$html ='
<a href="http://example.com/?pg=1"></a>
<a href="http://example.com/?pg=2"></a>
<a href="http://example.com/?pg=3"></a>
';
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $anchor) {
$url = $anchor->getAttribute('href');
$query = parse_url($url, PHP_URL_QUERY);
parse_str($query, $output);
$pg = $output['pg'];
//do something
}
Here's a helpful tutorial for PHP. http://htmlparsing.com/php.html
Also see here, why you should not use Regex for parsing html https://stackoverflow.com/a/1732454/81785
Upvotes: 1