user340538
user340538

Reputation: 183

Convert hashtags to hyperlinks without partially matching htmlentities

I want to replace all occurrences of #word with an HTML link. I have written a preg_replace() call for this:

$text = preg_replace('~#([\p{L}|\p{N}]+)~u', '<a href="/?aranan=$1">#$1</a>', $text);

The problem is, this regular expression also matches the html character codes like &#039; and therefore corrupts the output.

I need to exclude alphanumeric substrings which are preveded by &#, but I do not know how to do that using regular expressions.

Upvotes: 1

Views: 124

Answers (4)

mickmackusa
mickmackusa

Reputation: 47864

  1. Use a SKIP-FAIL subpattern to match whole sequences which should not be replaced. Write your hash-prefixed multibyte-safe word subpattern to match any substrings which were not disqualified. This will eliminate pattern ambiguity and ensure replacement accuracy.
  2. A character class does not need a pipe to separate the two character ranges. The curly braces can be removed too.
  3. In the replacement, if you are generating HTML <a> elements, then properly URL-encode the href value and HTML-encode the printed link text.

Code: (Demo)

$text = '#Test &#039; #039foo "#bär"';

echo preg_replace_callback(
    '~&#\d+;(*SKIP)(*FAIL)|#([\pL\pN]+)~u',
    fn($m) => sprintf(
        '<a href="/?%s">#%s</a>',
        http_build_query(['aranan' => $m[1]]),
        htmlentities($m[1])
    ),
    $text
);

Unrendered output:

<a href="/?aranan=Test">#Test</a> &#039; <a href="/?aranan=039foo">#039foo</a> "<a href="/?aranan=b%C3%A4r">#b&auml;r</a>"

Rendered HTML:

#Test ' #039foo "#bär"

Upvotes: 0

Christopher Richa
Christopher Richa

Reputation: 1295

You would need to add a [A-Za-z] rule in your regular expression statement so that it only limits itself to letters and no numbers.

I will edit with an example later on.

Upvotes: -1

holms
holms

Reputation: 9560

http://gskinner.com/RegExr/

use this online regular expression constructor. They have explanation for every flag you may want to use.. and you will see highlighted matches in example text.

and yes use [a-zA-Z]

Upvotes: 0

Kamil Szot
Kamil Szot

Reputation: 17817

'~(?<!&)#([\p{L}|\p{N}]+)~u'

That's a negative lookbehind assertion: http://www.php.net/manual/en/regexp.reference.assertions.php

Matches # only if not preceded by &

Upvotes: 2

Related Questions