Reputation: 409
The following regular expression extracts all hrefs from a page with 'preg_match_all':
/\s+href\s*=\s*[\"\']?([^\s\"\']+)[\"\'\s]+/ims
IF there is a 'rel' attribute in the 'a' tag i would like to return that with the result. How do i modify the code at the top to include the 'rel' attribute(if present)?
UPDATE: the following:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. <a href="http://example.com" rel="nofollow">Duis</a>
nirure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
returns:
Array
(
[0] => Array
(
[0] => href="http://example.com"
)
[1] => Array
(
[0] => http://example.com
)
)
i would like it to return:
Array
(
[0] => Array
(
[0] => href="http://example.com" rel="nofollow"
)
[1] => Array
(
[0] => http://example.com
)
)
Upvotes: 0
Views: 94
Reputation: 12389
Can optionally capture it using a lookahead:
$regex = '~<a\b(?=(?>[^>]*rel\s*=\s*["\']([^"\']+))?)[^>]*href=\s*["\']\s*\K[^"\']+~';
Add the i (PCRE_CASELESS)
modifier after closing delimiter ~
to match case insensitive.
See further explanation and example on regex101 and SO Regex FAQ
Using preg_match_all maybe want to add PREG_SET_ORDER
flag:
preg_match_all($regex, $str, $out, PREG_SET_ORDER);
print_r($out);
Which gives a result like this:
Array
(
[0] => Array
(
[0] => http://example.com
[1] => nofollow
)
[1] => Array
(
[0] => http://example2.com
[1] => nofollow
)
)
See test at eval.in
As others mentioned, regex is not the perfect means for parsing html. Depends on what you're going to achieve and how the input looks / if it is your input and know what to expect.
Upvotes: 0
Reputation: 67968
\s+href\s*=\s*[\"\']?(([^\s\"\']+)[\"\'\s]+rel="[^"]*")|\s+href\s*=\s*[\"\']?([^\s\"\']+)[\"\'\s]+
You can use this.This will give rel
if it is there.
See demo.
http://regex101.com/r/jT3pG3/4
Upvotes: 1