Mohamed Fawaskhan
Mohamed Fawaskhan

Reputation: 51

using preg_match_all image source with user defined tags

HTML code

<img src="http://website/image/ngshjk.jpeg" onload="img_onload(this);" onerror="img_onerror(this);" data-pid="dynamicvalue" data-imagesize="ppew" data-error-url="http://img.comb/6/z2default.jpg" class="small_image imageZoom " alt="image" title="" id="visible-image-small" rel="dynamicvalue" data-zoom-src="http://img.comb/6/z21347.jpeg" style="display: inline;">

PHP code

preg_match_all('/<img(.*) onload="(.*)" \/s',$con,$val);

Already this page have so many img tag. so I tried to get the src of particular image using some attributes inside the img tag. i cannot be correct in preg_match_all. please correct me in getting source in the above img tag.

Upvotes: 1

Views: 785

Answers (3)

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

To get all the image tags on the page it would probably be much easier to use an HTML parsing tool like:

// load your html string
$dom = new DOMDocument();
$dom->loadHTML($your_html_here);


// find all the img tags
$imgs = $dom->getElementsByTagName('img');

// cycle through all image tags
foreach($imgs as $img) {
    $src = $img->getAttribute("src");
    // do something
}

Upvotes: 0

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

Description

This expression will:

  • validate the image tag has attribute/value of data-imagesize="ppew"
  • validate the image tag has attribute/value of data-pid="ABCDEFGHIJ"
  • capture the src attribute value
  • avoid potentially difficult problems

.

<img\b(?=\s) # capture the open tag
(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sdata-imagesize="ppew")  # validate data-imagesize exists with a specific value
(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sdata-pid="ABCDEFGHIJ")  # validate data-pid exists with a specific value
(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\ssrc=['"]([^"]*)['"]?)  # capture the src attribute value
(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?> # get the entire  tag

enter image description here

Examples

Live Example: http://www.rubular.com/r/PBJ50cax7L

Single Line Regex: <img\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sdata-imagesize="ppew")(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sdata-pid="ABCDEFGHIJ")(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\ssrc=['"]([^"]*)['"]?)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?>

Sample Text

Note the first line has some potentially problematic conditions

<img onmouseover=' data-imagesize="ppew" ; data-pid="ABCDEFGHIJ" ; funSwap(data-imagesize, data-pid) ; ' src="http://website/NotTheDroidYourLookingFor.jpeg" onload="img_onload(this);" onerror="img_onerror(this);" data-pid="jihgfedcba" data-imagesize="ppew" />
<img src="http://website/someurl.jpeg" onload="img_onload(this);" onerror="img_onerror(this);" data-pid="ABCDEFGHIJ" data-imagesize="ppew" />

Capture Groups

[0] = <img src="http://website/someurl.jpeg" onload="img_onload(this);" onerror="img_onerror(this);" data-pid="ABCDEFGHIJ" data-imagesize="ppew" />
[1] = http://website/someurl.jpeg

Upvotes: 0

Jerry
Jerry

Reputation: 71538

You might be better off using the lazy .*? instead of the greedy .*.

preg_match_all('/<img(.*?)\sonload="([^"]*)"/s',$con,$val);

And change the second .* to [^"]* instead.

.*? matches the least number of characters until the next match (in this case onload...) and [^"]* matches any non quotes characters in between the quotes.

Upvotes: 3

Related Questions