Reputation: 1970
I have a string which may contain a pattern like:
LINK([anchor text],[link])
What I would like to do is transform this expression into a HTML link:
<a href="link">anchor text</a>
At the moment, I'm performing the replacement with the following PHP snippet:
$string = 'LINK( some anchor text , http://mydomain.com )';
$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';
$replace = '<a href="$2">$1</a>';
preg_replace($search, $replace, $string);
The problem I'm facing are the spaces after the anchor text. Fortunately, in HTML multiple spaces are interpreted as a single space, but in this example I would however show a link with a (underlined) annoying space. Is there any way to trim this anchor text? I can't treat it as the "link" substring, since it may contain spaces.
Upvotes: 2
Views: 93
Reputation: 2416
What you can do in this case is change the first group to group lazily.
$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';
Can be changed to:
$search = '/LINK\s*\(\s*(.+?)\s*,\s*([^\s]+)\s*\)/';
Notice the question mark after the plus. This tells the program to match it using the least number of characters.
In this case, the laziest it can match is a string, followed by any number of spaces, then a comma.
In the original case, it would be matching greedily. This means that it will try to match the maximum number of characters possible, causing the .+
to match all characters up to the comma.
Here is a regex101 of the code.
Upvotes: 1
Reputation: 12389
You could make the relevant quantifiers lazy, that they don't eat up the white-spaces before ,
or )
:
'/LINK\(\s*(.+?)\s*,\s*([^\s]+?)\s*\)/'
by adding an ?
after +
.
Upvotes: 1
Reputation: 71578
Assuming that the anchor text cannot contain commas or more than 1 space in a row, you could perhaps use:
LINK\s*\(\s*([^\s,]+(?:\s[^\s,]+)*)\s*,\s*(\S+)\s*\)
Instead of .+
, I'm using [^\s,]+(?:\s[^\s,]+)*
which will match one word, and more words separated by space (where a word is a series of non-space characters with at least one character).
Also changed your negated class [^\s]
which appears later on to \S
.
Upvotes: 2