joshholat
joshholat

Reputation: 3411

Get src value containing a specific keyword from all <img> tags

I'm trying to match src="URL" tags like the following:

src="http://3.bp.blogspot.com/-ulEY6FtwbtU/Twye18FlT4I/AAAAAAAAAEE/CHuAAgfQU2Q/s320/DSC_0045.JPG"

Basically, anything that has somre sort of bp.blogspot URL inside of the src attribute. I have the following, but it's only partially working:

preg_match('/src=\"(.*)blogspot(.*)\"/', $content, $matches);

Upvotes: 0

Views: 354

Answers (2)

mickmackusa
mickmackusa

Reputation: 48070

Use XPath to target only the img tags which have a stc value containing blogspot.

Code: (Demo)

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$result = [];
foreach ($xpath->query("//img[contains(@src, 'blogspot')]/@src") as $src) {
    $result[] = $src->nodeValue;
}
var_export($result);

Upvotes: 0

Regexident
Regexident

Reputation: 29562

This one accepts all blogspot urls and allows escaped quotes:

src="((?:[^"]|(?:(?<!\\)(?:\\\\)*\\"))+\bblogspot\.com/(?:[^"]|(?:(?<!\\)(?:\\\\)*\\"))+)"

The URL gets captured to match group 1.

You will need to escape \ and / with an additional \ (for each occurence!) to use in preg_match(…).

Explanation:

src=" # needle 1
( # start of capture group
    (?: # start of anonymous group
        [^"] # non-quote chars
        | # or:
        (?:(?<!\\)(?:\\\\)*\\") # escaped chars
    )+ # end of anonymous group
    \b # start of word (word boundary)
    blogspot\.com/ # needle 2
    (?: # start of anonymous group
        [^"] # non-quote chars
        | # or:
        (?:(?<!\\)(?:\\\\)*\\") # escaped chars
    )+ # end of anonymous group
    ) # end of capture group
" # needle 3

Upvotes: 3

Related Questions