Reputation: 3411
I'm trying to match src="URL" tags like the following:
src="http://3.bp.blogspot.com/-ulEY6FtwbtU/Twye18FlT4I/AAAAAAAAAEE/CHuAAgfQU2Q/s320/DSC_0045.JPG"
Basically, anything that has somre sort of bp.blogspot URL inside of the src attribute. I have the following, but it's only partially working:
preg_match('/src=\"(.*)blogspot(.*)\"/', $content, $matches);
Upvotes: 0
Views: 354
Reputation: 48070
Use XPath to target only the img tags which have a stc value containing blogspot
.
Code: (Demo)
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$result = [];
foreach ($xpath->query("//img[contains(@src, 'blogspot')]/@src") as $src) {
$result[] = $src->nodeValue;
}
var_export($result);
Upvotes: 0
Reputation: 29562
This one accepts all blogspot urls and allows escaped quotes:
src="((?:[^"]|(?:(?<!\\)(?:\\\\)*\\"))+\bblogspot\.com/(?:[^"]|(?:(?<!\\)(?:\\\\)*\\"))+)"
The URL gets captured to match group 1.
You will need to escape \
and /
with an additional \
(for each occurence!) to use in preg_match(…)
.
Explanation:
src=" # needle 1
( # start of capture group
(?: # start of anonymous group
[^"] # non-quote chars
| # or:
(?:(?<!\\)(?:\\\\)*\\") # escaped chars
)+ # end of anonymous group
\b # start of word (word boundary)
blogspot\.com/ # needle 2
(?: # start of anonymous group
[^"] # non-quote chars
| # or:
(?:(?<!\\)(?:\\\\)*\\") # escaped chars
)+ # end of anonymous group
) # end of capture group
" # needle 3
Upvotes: 3