Chris
Chris

Reputation: 579

Convert URLs to hyperlinks in HTML without replacing src attribute values

I'm trying to convert URLs, but not if they come after src=". So far, I have this...

return preg_replace('@(?!^src=")(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>', $s);

It converts the URL, but even if it is before src=".

Upvotes: 0

Views: 88

Answers (2)

mickmackusa
mickmackusa

Reputation: 47894

I must infer the intent of this task in the absence of a minimal verifiable example.

By leveraging a legitimate DOM parser, you can largely prevent the matching of non-text nodes which contain otherwise qualifying URL values.

Below uses an XPath query to prevent matching the URL value which is already the child of an <a> tag. By only targeting text(), there is no chance of replacing tag attribute values.

What comes next is some of the clever magic while looping over the text nodes.

Use preg_match_all() to isolate one or more nodes URLs in each text node, then create a new <a> element to replace the respective URL segment of text.

Use splitText() to "spit out" the leading portion of text before the URL -- it will become a new node prior to the current node.

Use replace_child() to replace the remaining text with the new <a> node.

Use insertBefore() to prepend the text that originally followed the URL text as a new text node.

Code: (Demo)

$html = <<<HTML
<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link http://example.com/number2 then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another HTTPS://www.example.net/booyah</p> and done
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regex = '#\bhttps?://[-\w.]+(?::\d+)?(?:/(?:[\w/_.-]*(?:\?\S+)?)?)?#ui';
foreach ($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $text = $textNode->nodeValue;
    foreach (preg_match_all($regex, $text, $m) ? $m[0] : [] as $url) {
        $a = $dom->createElement('a', htmlspecialchars($url));
        $a->setAttribute('href', $url);
        $mbPosOfUrlInText = mb_strpos($text, $url);
        // regurgitate any leading text as a new preceding node
        // then replace remainder of text with new hyperlink
        $textNode->parentNode->replaceChild(
            $a,
            $textNode->splitText($mbPosOfUrlInText)
        );
        // add any text after url as new text node after new hyperlink
        $textNode->parentNode->insertBefore(
            $dom->createTextNode(
                mb_substr($text, $mbPosOfUrlInText + mb_strlen($url))
            ),
            $a->nextSibling
        );
    }
}
echo $dom->saveHTML();

Output:

<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link <a href="http://example.com/number2">http://example.com/number2</a> then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another <a href="HTTPS://www.example.net/booyah">HTTPS://www.example.net/booyah</a></p> and done
</div>

Upvotes: 0

John Kugelman
John Kugelman

Reputation: 361595

Make that a lookbehind assertion.

(?<!^src=")

Upvotes: 2

Related Questions