Reputation: 470
I'm trying to find ALL images in my blog posts with regex. The code below returns images IF the code is clean and the SRC tag comes right after the IMG tag. However, I also have images with other attributes such as height and width. The regex I have does not pick that up... Any ideas?
The following code returns images that looks like this:
<img src="blah_blah_blah.jpg">
But not images that looks like this:
<img width="290" height="290" src="blah_blah_blah.jpg">
Here is my code
$pattern = '/<img\s+src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Upvotes: 1
Views: 2434
Reputation: 70750
Use DOM
or another parser for this, don't try to parse HTML with regular expressions.
$html = <<<DATA
<img width="290" height="290" src="blah.jpg">
<img src="blah_blah_blah.jpg">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
blah.jpg
blah_blah_blah.jpg
Upvotes: 4
Reputation: 5937
Ever think of using the DOM object instead of regex?
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
Upvotes: 3
Reputation: 310
Try this:
$pattern = '/<img\s.*?src=["\']([^"\']+)["\']/i';
Single or double quote and dynamic src
attr position.
Upvotes: 1
Reputation: 44851
The problem is that you only accept \s+
after <img
. Try this instead:
$pattern = '/<img\s+[^>]*?src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Upvotes: 1
Reputation: 91518
You'd better to use a parser, but here is a way to do with regex:
$pattern = '/<img\s.*?src="([^"]+)"/i';
Upvotes: 1