user3737387
user3737387

Reputation: 3

Rules of regex engines. Greediness, eagerness and laziness of regexes

As we all know, regex engine use two rules when it goes about its work:

These lines appear in tutorial:

The two of these rules go hand in hand.

It's eager to give you a result, so what it does is it tries to just keep letting that first one do all the work.

While we're already in the middle of it, let's keep going, get to the end of the string and then when it doesn't work out, then it will backtrack and try another one.

It doesn't backtrack back to the beginning; it doesn't try all sorts of other combinations.

It's still eager to get you a result, so it says, what if I just gave back one?

Would that allow me to give a result back?

If it does, great, it's done. It's able to just finish there.

It doesn't have to keep backtracking further in the string, looking for some kind of a better match or match that's further along.

I don't quite understand these lines (especially 2nd ("While we're...") and last ("It doesn't have to keep backtracking") sentences).

And these lines about lazy mode.

It still defers to the overall match just like the greedy one does

I don't understand the following analogy:

It's not necessarily any faster or slower to choose a lazy strategy or a greedy strategy, but it will probably match different things.

Now as far as is faster or slower, it's a little bit like saying, if you've lost your car keys and your sunglasses inside your house, is it better to start looking in the kitchen or to start looking in the living room?

You don't know which one's going to yield the best result, and you don't know which one's going to find the sunglasses first or the keys first; it's just about different strategies of starting the search.

So you will likely get different results depending on where you start, but it's not necessarily faster to start in one place or the other.

What 'faster or slower' means?

I'm going to draw scheme how it work (in both case). So I will contemplate this questions until I find out what's going on around here!) I need understand it exactly and unambiguously.

Thanks.

Upvotes: 0

Views: 293

Answers (1)

Tensibai
Tensibai

Reputation: 15784

Let's try by the exemple

for an input of this is input for test input on regex and a regex like /this.*input/

The match will be this is input for test input

What will be done is

  • starting to examine the string and it will get a match with this is input

  • But now its at the middle of the string, it will continue to see if it could match more on it (this is the While we're already in the middle of it, let's keep going )

  • It will match till this is input for test input and continue till the end of the string

  • at the end, there's things wich are not part of the match, so the interpreter "backtrack" to the last time it matches.

For the last part its more about the ored regexes

Consider input string as cdacdgabcdef and the regex (ab|a).*

A common mistake is thinking it will return the more precise one (in this case 'abcdef') but it will return 'acdgabcdef' because the a match is the first one to match.

what happens here is: There's something matching this part, let's continue to the next part of the pattern and forget about the other options in this part.

For the lazy and greedy questions, the link of @AvinashRaj is clear enough, I won't repeat it here.

Upvotes: 1

Related Questions