Reputation: 93
I'm trying to search for only links without www
like http://google.com
, OR https://facebook.com
, etc. Then I want to add www
to same link so it becomes http://www.google.com
, OR https://www.facebook.com
, etc.
However, I have a problem in my pattern (the pattern I used to get all links with or without www).
$text = '<a href="http://google.com">google</a> bla bla bla <a href="https://www.google.com">google</a>';
preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU", $text, $matches);
foreach ($matches[2] as $old_url)
{
$text = str_replace("$old_url","$new_url",$text);
}
Upvotes: 1
Views: 152
Reputation: 626794
Here is the sample code with the <a\\s[^>]*href=([\"']?)(?>https?|ftps?):\/\/(?![^'\">]*www[^\"]+?\\1)([^'\">]+?)\\1[^>]*>(.*?)<\\/a>
regex to match only those URLs in href attributes that have no www
in them.
$re = "/<a\\s[^>]*href=([\"']?)(?>https?|ftps?):\/\/(?![^'\">]*www[^\"]+?\\1)([^'\">]+?)\\1[^>]*>(.*?)<\\/a>/";
$str = "<a href=\"http://google.com\">google</a> bla bla bla <a href=\"https://www.google.com\">google</a> bla bla bla <a href=\"http://facebook.com\">facebook</a>\n";
print ($str . "\n");
$str = preg_replace_callback(
$re,
function ($matches) {
return str_replace($matches[2], "www." . $matches[2], $matches[0]);
},
$str
);
print ($str);
Output:
<a href="http://www.google.com">google</a> bla bla bla <a href="https://www.google.com">google</a> bla bla bla <a href="http://www.facebook.com">facebook</a>
Upvotes: 0
Reputation: 70732
I would consider using DOM and XPath to take care of this for you.
$doc = new DOMDocument;
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$links = $xpath->query('//a[not(contains(@href, "www."))]/@href');
foreach ($links as $link) {
// process yours urls by $link->nodeValue
...
...
}
You could probably then use parse_url()
to replace while processing the url.
Upvotes: 2