Baranovskiy Dmitry
Baranovskiy Dmitry

Reputation: 473

Regular expression with looking back

I need to find all operations in simple expression using regex. For ex:

a+b*c/d

Here we have 3 operations here.

  1. a+b
  2. b*c
  3. c/d

Regex like \d.*[\+\-\*\/].*\d returns only two matches.

  1. a+b
  2. c/d

Is there any way to find all matches?

Upvotes: 2

Views: 796

Answers (1)

HamZa
HamZa

Reputation: 14931

To arrive at the answer, I'll split it in simple steps.

1) Match a (math) b

For simplicity, we'll define a number as \d+ which means match one digit or more. If you want a more comprehensive regex, you may take a look at this answer.

To match math operators, we might use a character class [/*+-]. If you put characters in a character class they lose their regex meaning, so [.] will only match a dot. We'll use different delimiters than /, that way we won't need to escape / in our expression. The hyphen - is often used to define a character range a-z but if you put it at the beginning or the end of the character class, you won't need to escape it.

Our regex will look like \d+\s*[/*+-]\s*\d+. \s* is there to match some whitespace(s) optionally.

Online demo

2) Match a (math) b (math) c (math) d

When using the above pattern, you'll realise that it matches only a (math) b and c (math) d whereas we want also to match b (math) c.

The problem

Let's take a simple example 1+2*3/4, when the regex engine uses the following expression \d+\s*[/*+-]\d+:

1+2*3/4
^^^ match and advance

1+2*3/4
   ^ no match

1+2*3/4
    ^^^ match and advance

Nothing to do

So our problem is that the engine when it finishes one match, it will continue from the last character position + 1 while we want it to continue from the end of the first digit.

1+2*3/4
^^^ match and advance

1+2*3/4
 ^ continue from here ?

The solution

We'll need a zerowidth lookahead assertion (?=). For example a(?=b) means if there is b after a, then match a so a gets matched in ab but not in ac. The advantage of this is that the regex engine will continue from position b instead of position b + 1.

ab
^ match and continue

ab
 ^ no match

We might exploit this and use a capturing group to "dump" the desired results in a group : (?=(\d+\s*[/*+-]\d+)).

1+2*3/4
^
^^^ match dump it in group 1 and continue

1+2*3/4
 ^ no match

1+2*3/4
  ^
  ^^^ match dump it in group 1 and continue

1+2*3/4
   ^ no match

1+2*3/4
    ^
    ^^^ match dump it in group 1 and continue

1+2*3/4
     ^ no match

1+2*3/4
      ^ no match

The end

Online demo

3) A wild problem appeared

So far so fine but when we tested some other digits we got some weird results. The input is 12+3 and it gave us two results in group 1 instead of one 12+3 and 2+3. What's the reason ?

Well let's take a look step by step:

12+3
^
^^^^ match and dump it in group 1 and continue

12+3
 ^
 ^^^ match and dump it in group 1 and continue

12+3
  ^ no match

12+3
   ^ no match

Ah it seems like advancing with 1 step isn't good after all. So we need to match a number (?=(\d+\s*[/*+-]\d+))\d+ !

12+3
^^
^^^^ match and dump it in group 1 and continue

12+3
  ^ no match

12+3
   ^ no match

A bit late for a TLDR, use ~(?=(\d+\s*[/*+-]\d+))\d+~ with the g modifier for some languages.

Depending on the language you might not be able to use custom delimiters which means you'll need to escape / in your expression.

Online demo

Upvotes: 10

Related Questions