CSmith
CSmith

Reputation: 11

Need regex help in PHP 5

Ok. Admittedly, I am not the best at working with regular expressions. What I am doing is a screen scrape, then trying to fix the img src values in the embedded images to point back to the original domain. This is the regex I have been trying variations of (too many to list - here's the current one):

preg_match_all('/<img\b[^>]*>/i', $html, $images);  

What this ends up doing is to replace all < with />. What I need it to do is just return the (currently) five images on the page in an array so that I can work with those to fix their src values, then write them back to $html, which is set at the beginning of the file:

$html = file_get_contents($target_url);

Upvotes: 1

Views: 85

Answers (1)

lonesomeday
lonesomeday

Reputation: 237847

Basically, don't do this with regex. You can parse HTML with regex, but it is almost certainly not worth the effort.

Do it with genuine DOM parsing instead, using the DOMDocument class:

$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
    $image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();

Upvotes: 5

Related Questions