Reputation: 3783
Suppose that we have this code in MATLAB:
ax = 'aa+bb+cc+dd';
middle_part = regexp(ax, '\+(\w+)\+','tokens');
Why does MATLAB only return 'bb'
as output, and not 'bb'
and 'cc'
?
Upvotes: 3
Views: 380
Reputation: 626952
You need to place the second +
into a lookahead so that it is not consumed by the regex engine. Here is an answer of mine on how look-aheads work.
Here is code snippet:
ax = 'aa+bb+cc+dd';
middle_part = regexp(ax, '\+(\w+)(?=\+)','tokens');
disp(middle_part)
Result:
{
[1,1] =
{
[1,1] = bb
}
[1,2] =
{
[1,1] = cc
}
}
So, in short, here is what is going on: \+(\w+)\+
matches +bb+
, and moves the index right after the +
that is after bb
. So, there is only cc+dd
to be tested. No match is found as the pattern requires 2 +
symbols around 1 or more word characters.
With a lookahead version, \+(\w+)(?=\+)
, the engine matches +bb
that is right in front of a +
and moves the index right after the second b
. The string left is +cc+dd
. So, there is another +cc
match.
Upvotes: 2