Reputation: 55
My HTML is:
<li>
<a href="/prod_1"></a>
<img src="/preview_1.jpg" data-image-href="//cdn.example.com/zoom_1.jpg" />
</li>
<li>
<a href="/prod_2"></a>
<img src="/preview_2.jpg" data-image-href="//cdn.example.com/zoom_2.jpg" />
</li>
...
I am using this regex:
/(src|href)=("[^"]+")/siU
Results are:
[2][0] => "/prod_1"
[2][1] => "/preview_1.jpg"
[2][2] => "//cdn.example.com/zoom_1.jpg"
[2][3] => "/prod_2"
[2][4] => "/preview_2.jpg"
[2][5] => "//cdn.example.com/zoom_2.jpg"
...
After adding <img.*
to the start of the regex, results obtained are distorted. I need match src
and href
attributes only inside IMG
elements. What is the right way to achieve that?
Upvotes: 1
Views: 48
Reputation: 626861
You can limit the matched characters to [^>]
(not a closing anlge bracket) and only get the img
attributes:
(?:<img\s*?|(?<!^)\G)\s*?([^>=]+)="([^"]*?)"(?=.*?\/>)
See demo.
Adding PHP sample code:
$re = "/(?:<img\\s*?|(?<!^)\\G)\\s*?([^>=]+)=\"([^\"]*?)\"(?=.*?\\/>)/siU";
$str = "<li>\n <a href=\"/prod_1\"></a>\n <img src=\"/preview_1.jpg\" data-image-href=\"//cdn.example.com/zoom_1.jpg\" />\n</li>\n<li>\n <a href=\"/prod_2\"></a>\n <img src=\"/preview_2.jpg\" data-image-href=\"//cdn.example.com/zoom_2.jpg\" />\n</li>";
preg_match_all($re, $str, $matches);
Upvotes: 4