onegun
onegun

Reputation: 803

Regex extract image links

I am reading a html content. There are image tags such as

<img onclick="document.location='http://abc.com'" src="http://a.com/e.jpg" onload="javascript:if(this.width>250) this.width=250">

or

<img src="http://a.com/e.jpg" onclick="document.location='http://abc.com'" onload="javascript:if(this.width>250) this.width=250" />

I tried to reformat this tags to become

<img src="http://a.com/e.jpg" />

However i am not successful. The codes i tried to build so far is like

$image=preg_replace('/<img(.*?)(\/)?>/','',$image);

anyone can help?

Upvotes: 1

Views: 200

Answers (2)

user2609094
user2609094

Reputation:

Here's a version using DOMDocument that removes all attributes from <img> tags except for the src attribute. Note that doing a loadHTML and saveHTML with DOMDocument can alter other html as well, especially if that html is malformed. So be careful - test and see if the results are acceptable.

<?php

$html = <<<ENDHTML
<!doctype html>
<html><body>
<a href="#"><img onclick="..." src="http://a.com/e.jpg" onload="..."></a>

<div><p>
<img src="http://a.com/e.jpg" onclick="..." onload="..." />
</p></div>
</body></html>
ENDHTML;

$dom = new DOMDocument;
if (!$dom->loadHTML($html)) {
    throw new Exception('could not load html');
}

$xpath = new DOMXPath($dom);

foreach ($xpath->query('//img') as $img) {
    // unfortunately, cannot removeAttribute() directly inside
    // the loop, as this breaks the attributes iterator.
    $remove = array();
    foreach ($img->attributes as $attr) {
        if (strcasecmp($attr->name, 'src') != 0) {
            $remove[] = $attr->name;
        }
    }

    foreach ($remove as $attr) {
        $img->removeAttribute($attr);
    }
}

echo $dom->saveHTML();

Upvotes: 1

Srb1313711
Srb1313711

Reputation: 2047

Match one at a time then concat string, I am unsure which language you are using so ill explain in pseudo:

1.Find <img with regex place match in a string variable
2.Find src="..." with src=".*?" place match in a string variable
3.Find the end /> with \/> place match in a string variable
4.Concat the variables together

Upvotes: 0

Related Questions