Reputation: 970
I want regex to remove all external links from my content and just keep the links of provided domain.
For ex.
$inputContent = 'Lorem Ipsum <a href="http://www.example1.com" target="_blank">http://www.example1.com</a> lorem ipsum dummy text <a href="http://www.mywebsite.com" target="_blank">http://www.mywebsite.com</a>';
Expected output:
$outputContent = 'Lorem Ipsum lorem ipsum dummy text <a href="http://www.mywebsite.com" target="_blank">http://www.mywebsite.com</a>';
Tried with this solution but it's not working.
$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?<!mywebsite)[^\'"])+\1 *>.*?</a>#i';
$filteredString = preg_replace($pattern, '', $content);
Upvotes: 2
Views: 1285
Reputation: 19375
Tried with this solution but it's not working.
$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?<!mywebsite)[^\'"])+\1 *>.*?</a>#i';
You were close. To make your solution work, remove just one >
, i. e.
$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?<!mywebsite)[^\'"])+\1 *.*?</a>#i';
Upvotes: 0
Reputation: 331
The solution with regex:
$inputContent = 'Lorem Ipsum <a href=\'http://www.example1.com\' target="_blank"><strong>http://www.example1.com</strong></a> lorem ipsum dummy text <a href="http://www.mywebsite.com" target="_blank">http://www.mywebsite.com</a>';
function callback($matches) {
//print_r($matches);
if (preg_match('#^https?://(www\.)?mywebsite\.com(/.+)?$#i', $matches[1])) {
return '<a href="' . $matches[1] . '" target="_blank">' . $matches[2] . '</a>';
}
//return '';
return $matches[2]; // or you can remove only the anchor and print the text only
}
$pattern = '#<a[^>]*href=[\'"]([^\'"]*)[\'"][^>]*>(((?!<a\s).)*)</a>#i';
$filteredString = preg_replace_callback($pattern, 'callback', $inputContent);
echo $filteredString;
Upvotes: 0
Reputation: 48711
What you need here is not Regular Expressions really. You are parsing HTML documents so you should choose the right tool for it: DOMDocument
.
<?php
$html = <<< HTML
Lorem Ipsum <a href="http://www.example1.com" target="_blank">http://www.example1.com</a>
lorem ipsum dummy text
<a href="http://mywebsite.com" target="_blank">http://www.mywebsite.com</a>
HTML;
$dom = new \DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new \DOMXPath($dom);
$site = 'mywebsite.com';
// Query all `a` tags that don't start with your website domain name
$anchors = $xpath->query("//a[not(starts-with(@href,'http://{$site}')) and not(starts-with(@href,'http://www.{$site}'))]");
foreach ($anchors as $anchor) {
$anchor->parentNode->removeChild($anchor);
}
echo $dom->saveHTML();
Output:
<p>Lorem Ipsum
lorem ipsum dummy text
<a href="http://mywebsite.com" target="_blank">http://www.mywebsite.com</a></p>
Upvotes: 2