hotstuff
hotstuff

Reputation: 21

Using preg_replace_callback() to extract all images from a string of HTML

Tricky preg_replace_callback function here - I am admittedly not great at PRCE expressions.

I am trying to extract all img src values from a string of HTML, save the img src values to an array, and additionally replace the img src path to a local path (not a remote path). Ie I might have, surrounded by a lot of other HTML:

img src='http://www.mysite.com/folder/subfolder/images/myimage.png'

And I would want to extract myimage.png to an array, and additionally change the src to:

src='images/myimage.png'

Can that be done?

Thanks

Upvotes: 2

Views: 1580

Answers (2)

svoop
svoop

Reputation: 3454

Do you need regex for this? Not necessary. Are regex the most readable solution? Probably not - at least unless you are fluent in regex. Are regex more efficient when scanning large amounts of data? Absolutely, the regex are compiled and cached upon first appearance. Do regex win the "least lines of code" trophy?

$string = <<<EOS
<html>
<body>
blahblah<br>
<img src='http://www.mysite.com/folder/subfolder/images/myimage.png'>blah<br>
blah<img src='http://www.mysite.com/folder/subfolder/images/another.png' />blah<br>
</body>
</html>
EOS;

preg_match_all("%<img .*?src=['\"](.*?)['\"]%s", $string, $matches);
$images = array_map(function ($element) { return preg_replace("%^.*/(.*)$%", 'images/$1', $element); }, $matches[1]);

print_r($images);

Two lines of code, that's hard to undercut in PHP. It results in the following $images array:

Array
(
  [0] => images/myimage.png
  [1] => images/another.png
)

Please note that this won't work with PHP versions prior to 5.3 unless you replace the anonymous function with a proper one.

Upvotes: 1

&#193;lvaro Gonz&#225;lez
&#193;lvaro Gonz&#225;lez

Reputation: 146450

Does it need to use regular expressions? Handling HTML is normally easier with DOM functions:

<?php

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML(file_get_contents("http://stackoverflow.com"));
libxml_use_internal_errors(false);

$items = $domd->getElementsByTagName("img");
$data = array();

foreach($items as $item) {
  $data[] = array(
    "src" => $item->getAttribute("src"),
    "alt" => $item->getAttribute("alt"),
    "title" => $item->getAttribute("title"),
  );
}

print_r($data);

Upvotes: 3

Related Questions