Reputation: 579
I'm trying to convert URLs, but not if they come after src=". So far, I have this...
return preg_replace('@(?!^src=")(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
It converts the URL, but even if it is before src="
.
Upvotes: 0
Views: 88
Reputation: 47894
I must infer the intent of this task in the absence of a minimal verifiable example.
By leveraging a legitimate DOM parser, you can largely prevent the matching of non-text nodes which contain otherwise qualifying URL values.
Below uses an XPath query to prevent matching the URL value which is already the child of an <a>
tag. By only targeting text()
, there is no chance of replacing tag attribute values.
What comes next is some of the clever magic while looping over the text nodes.
Use preg_match_all()
to isolate one or more nodes URLs in each text node, then create a new <a>
element to replace the respective URL segment of text.
Use splitText()
to "spit out" the leading portion of text before the URL -- it will become a new node prior to the current node.
Use replace_child()
to replace the remaining text with the new <a>
node.
Use insertBefore()
to prepend the text that originally followed the URL text as a new text node.
Code: (Demo)
$html = <<<HTML
<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link http://example.com/number2 then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another HTTPS://www.example.net/booyah</p> and done
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regex = '#\bhttps?://[-\w.]+(?::\d+)?(?:/(?:[\w/_.-]*(?:\?\S+)?)?)?#ui';
foreach ($xpath->query('//*[not(self::a)]/text()') as $textNode) {
$text = $textNode->nodeValue;
foreach (preg_match_all($regex, $text, $m) ? $m[0] : [] as $url) {
$a = $dom->createElement('a', htmlspecialchars($url));
$a->setAttribute('href', $url);
$mbPosOfUrlInText = mb_strpos($text, $url);
// regurgitate any leading text as a new preceding node
// then replace remainder of text with new hyperlink
$textNode->parentNode->replaceChild(
$a,
$textNode->splitText($mbPosOfUrlInText)
);
// add any text after url as new text node after new hyperlink
$textNode->parentNode->insertBefore(
$dom->createTextNode(
mb_substr($text, $mbPosOfUrlInText + mb_strlen($url))
),
$a->nextSibling
);
}
}
echo $dom->saveHTML();
Output:
<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link <a href="http://example.com/number2">http://example.com/number2</a> then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another <a href="HTTPS://www.example.net/booyah">HTTPS://www.example.net/booyah</a></p> and done
</div>
Upvotes: 0