Giorgio
Giorgio

Reputation: 1970

Trim substrings within PHP regular expressions

I have a string which may contain a pattern like:

LINK([anchor text],[link])

What I would like to do is transform this expression into a HTML link:

<a href="link">anchor text</a>

At the moment, I'm performing the replacement with the following PHP snippet:

$string = 'LINK(  some anchor text    ,   http://mydomain.com  )';
$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';
$replace = '<a href="$2">$1</a>';
preg_replace($search, $replace, $string);

The problem I'm facing are the spaces after the anchor text. Fortunately, in HTML multiple spaces are interpreted as a single space, but in this example I would however show a link with a (underlined) annoying space. Is there any way to trim this anchor text? I can't treat it as the "link" substring, since it may contain spaces.

Upvotes: 2

Views: 93

Answers (3)

Sean
Sean

Reputation: 2416

What you can do in this case is change the first group to group lazily.

$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';

Can be changed to:

$search = '/LINK\s*\(\s*(.+?)\s*,\s*([^\s]+)\s*\)/';

Notice the question mark after the plus. This tells the program to match it using the least number of characters.

In this case, the laziest it can match is a string, followed by any number of spaces, then a comma.

In the original case, it would be matching greedily. This means that it will try to match the maximum number of characters possible, causing the .+ to match all characters up to the comma.

Here is a regex101 of the code.

Upvotes: 1

Jonny 5
Jonny 5

Reputation: 12389

You could make the relevant quantifiers lazy, that they don't eat up the white-spaces before , or ):

'/LINK\(\s*(.+?)\s*,\s*([^\s]+?)\s*\)/'

by adding an ? after +.

Test

Upvotes: 1

Jerry
Jerry

Reputation: 71578

Assuming that the anchor text cannot contain commas or more than 1 space in a row, you could perhaps use:

LINK\s*\(\s*([^\s,]+(?:\s[^\s,]+)*)\s*,\s*(\S+)\s*\)

regex101 demo

Instead of .+, I'm using [^\s,]+(?:\s[^\s,]+)* which will match one word, and more words separated by space (where a word is a series of non-space characters with at least one character).

Also changed your negated class [^\s] which appears later on to \S.

Upvotes: 2

Related Questions