Gökhan Ege
Gökhan Ege

Reputation: 55

PHP regex match only specific elements

My HTML is:

<li>
    <a href="/prod_1"></a>
    <img src="/preview_1.jpg" data-image-href="//cdn.example.com/zoom_1.jpg" />
</li>
<li>
    <a href="/prod_2"></a>
    <img src="/preview_2.jpg" data-image-href="//cdn.example.com/zoom_2.jpg" />
</li>
...

I am using this regex:

/(src|href)=("[^"]+")/siU

Results are:

[2][0] => "/prod_1"
[2][1] => "/preview_1.jpg"
[2][2] => "//cdn.example.com/zoom_1.jpg"
[2][3] => "/prod_2"
[2][4] => "/preview_2.jpg"
[2][5] => "//cdn.example.com/zoom_2.jpg"
...

After adding <img.* to the start of the regex, results obtained are distorted. I need match src and href attributes only inside IMG elements. What is the right way to achieve that?

Upvotes: 1

Views: 48

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626861

You can limit the matched characters to [^>] (not a closing anlge bracket) and only get the img attributes:

(?:<img\s*?|(?<!^)\G)\s*?([^>=]+)="([^"]*?)"(?=.*?\/>)

See demo.

Adding PHP sample code:

$re = "/(?:<img\\s*?|(?<!^)\\G)\\s*?([^>=]+)=\"([^\"]*?)\"(?=.*?\\/>)/siU"; 
$str = "<li>\n    <a href=\"/prod_1\"></a>\n    <img src=\"/preview_1.jpg\" data-image-href=\"//cdn.example.com/zoom_1.jpg\" />\n</li>\n<li>\n    <a href=\"/prod_2\"></a>\n    <img src=\"/preview_2.jpg\" data-image-href=\"//cdn.example.com/zoom_2.jpg\" />\n</li>"; 
preg_match_all($re, $str, $matches);

Upvotes: 4

Related Questions