Reputation: 2132
I have a page that contains several hyperlinks. The ones I want to get are of the format:
<html>
<body>
<div id="diva">
<a href="/123" >text2</a>
</div>
<div id="divb">
<a href="/345" >text1</a>
<a href="/678" >text2</a>
</div>
</body>
</html>
I want to extract the three hrefs 123,345,and 678.
I know how to get all the hyperlinks using $gm = $xpath->query("//a")
and then loop through them to get the href attribute.
Is there some sort of regexp to get the attributes with the above format only (.i.e "/digits")?
Thanks
Upvotes: 2
Views: 872
Reputation: 89285
XPath 1.0, which is the version supported by DOMXPath()
, has no Regex functionalities. Though, you can easily write your own PHP function to execute Regex expression to be called from DOMXPath
if you need one, as mentioned in this other answer.
There is XPath 1.0 way to test if an attribute value is a number, which you can use on href
attribute value after /
character, to test if the attribute value follows the pattern /digits
:
//a[number(substring-after(@href,'/')) = substring-after(@href,'/')]
UPDATE :
For the sake of completeness, here is a working example of calling PHP function preg_match
from DOMXPath::query()
to accomplish the same task :
$raw_data = <<<XML
<html>
<body>
<div id="diva">
<a href="/123" >text2</a>
</div>
<div id="divb">
<a href="/345" >text1</a>
<a href="/678" >text2</a>
</div>
</body>
</html>
XML;
$doc = new DOMDocument;
$doc->loadXML($raw_data);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("preg_match");
// php:function's parameters below are :
// parameter 1: PHP function name
// parameter 2: PHP function's 1st parameter, the pattern
// parameter 3: PHP function's 2nd parameter, the string
$gm = $xpath->query("//a[php:function('preg_match', '~^/\d+$~', string(@href))]");
foreach ($gm as $a) {
echo $a->getAttribute("href") . "\n";
}
Upvotes: 3