Ashesh
Ashesh

Reputation: 949

preg matching all hrefs and srcs in a string

I'm trying to extract all the hrefs and srcs in a string like this :

$content = "
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium
voluptatum deleniti Image: <img src = 'http://example.com/check-3.png' /> Link: <a href ='http://example.com/test.xls'>test.xls</a>";

Basically what I want to do is change example.com to a to a different domain name (say test.com) and then extract all the filenames from hrefs and srcs. I was able to do the domain name replacement with a simple str_replace but now I'm stuck trying to extract the hrefs and srcs.

Here's what I tried using :

$regex = "/src=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>.*?href=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>/i";

This seems to work if there is no space between src (or href) and the = (e.g. ) but if there is space (e.g. ) it does not work. I've tried adding the space character but that fails the preg match. I don't want to use a heavy library like simple HTML dom, besides i don't think it will work as its not a proper HTML document. It's a string coming out of ckeditor.

Upvotes: 0

Views: 141

Answers (1)

Andrew Cheong
Andrew Cheong

Reputation: 30273

Why not just add quantifiers on the space?

$regex = "/src *= *[\"' ]?([^\"' >]+)[\"' ]?[^>]*>.*?href=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>/i";
               ^  ^

Upvotes: 1

Related Questions