Reputation: 949
I'm trying to extract all the hrefs and srcs in a string like this :
$content = "
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium
voluptatum deleniti Image: <img src = 'http://example.com/check-3.png' /> Link: <a href ='http://example.com/test.xls'>test.xls</a>";
Basically what I want to do is change example.com to a to a different domain name (say test.com) and then extract all the filenames from hrefs and srcs. I was able to do the domain name replacement with a simple str_replace but now I'm stuck trying to extract the hrefs and srcs.
Here's what I tried using :
$regex = "/src=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>.*?href=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>/i";
This seems to work if there is no space between src (or href) and the = (e.g. ) but if there is space (e.g. ) it does not work. I've tried adding the space character but that fails the preg match. I don't want to use a heavy library like simple HTML dom, besides i don't think it will work as its not a proper HTML document. It's a string coming out of ckeditor.
Upvotes: 0
Views: 141
Reputation: 30273
Why not just add quantifiers on the space?
$regex = "/src *= *[\"' ]?([^\"' >]+)[\"' ]?[^>]*>.*?href=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>/i";
^ ^
Upvotes: 1