Reputation: 473
I need to find all operations in simple expression using regex. For ex:
a+b*c/d
Here we have 3 operations here.
Regex like \d.*[\+\-\*\/].*\d
returns only two matches.
Is there any way to find all matches?
Upvotes: 2
Views: 796
Reputation: 14931
To arrive at the answer, I'll split it in simple steps.
For simplicity, we'll define a number as \d+
which means match one digit or more. If you want a more comprehensive regex, you may take a look at this answer.
To match math operators, we might use a character class [/*+-]
. If you put characters in a character class they lose their regex meaning, so [.]
will only match a dot. We'll use different delimiters than /
, that way we won't need to escape /
in our expression. The hyphen -
is often used to define a character range a-z
but if you put it at the beginning or the end of the character class, you won't need to escape it.
Our regex will look like \d+\s*[/*+-]\s*\d+
. \s*
is there to match some whitespace(s) optionally.
When using the above pattern, you'll realise that it matches only a (math) b
and c (math) d
whereas we want also to match b (math) c
.
The problem
Let's take a simple example 1+2*3/4
, when the regex engine uses the following expression \d+\s*[/*+-]\d+
:
1+2*3/4
^^^ match and advance
1+2*3/4
^ no match
1+2*3/4
^^^ match and advance
Nothing to do
So our problem is that the engine when it finishes one match, it will continue from the last character position + 1 while we want it to continue from the end of the first digit.
1+2*3/4
^^^ match and advance
1+2*3/4
^ continue from here ?
The solution
We'll need a zerowidth lookahead assertion (?=)
. For example a(?=b)
means if there is b
after a
, then match a
so a
gets matched in ab
but not in ac
. The advantage of this is that the regex engine will continue from position b
instead of position b
+ 1.
ab
^ match and continue
ab
^ no match
We might exploit this and use a capturing group to "dump" the desired results in a group : (?=(\d+\s*[/*+-]\d+))
.
1+2*3/4
^
^^^ match dump it in group 1 and continue
1+2*3/4
^ no match
1+2*3/4
^
^^^ match dump it in group 1 and continue
1+2*3/4
^ no match
1+2*3/4
^
^^^ match dump it in group 1 and continue
1+2*3/4
^ no match
1+2*3/4
^ no match
The end
So far so fine but when we tested some other digits we got some weird results. The input is 12+3
and it gave us two results in group 1 instead of one 12+3
and 2+3
. What's the reason ?
Well let's take a look step by step:
12+3
^
^^^^ match and dump it in group 1 and continue
12+3
^
^^^ match and dump it in group 1 and continue
12+3
^ no match
12+3
^ no match
Ah it seems like advancing with 1 step isn't good after all. So we need to match a number (?=(\d+\s*[/*+-]\d+))\d+
!
12+3
^^
^^^^ match and dump it in group 1 and continue
12+3
^ no match
12+3
^ no match
A bit late for a TLDR, use ~(?=(\d+\s*[/*+-]\d+))\d+~
with the g
modifier for some languages.
Depending on the language you might not be able to use custom delimiters which means you'll need to escape /
in your expression.
Upvotes: 10