Westin
Westin

Reputation: 33

Non greedy regular expression not matching what I expected

I am searching through text for lines which use "variable2" without a semicolon before it in the line. Here is my regular expression to solve this.

^[^;]*?variable2

My understanding of this is that it should find text starting with a newline which minimizes the number of non semicolon characters followed by "variable2". This fails to select what I expect in this example.


Label0: mov     variable0,WREG             ;Some comment
        mov     W0,variable1

Label1: btsc    variable2,#1               ;Some other comment
        bra     label2

I expected to get this

Label1: btsc    variable2

but selected this instead

        mov     W0,variable1

Label1: btsc    variable2

What am I misunderstanding? It seems to me the non-greedy expression is not doing what I intended it to do. If I change my regular expression to ^[^;\n]*?variable2, it selects what I expect it to select. I am using Sublime Text 2 for my regular expressions, but I seem to get the same results in php, javascript, and python (according to regex101.com).

Upvotes: 3

Views: 1237

Answers (3)

Jan
Jan

Reputation: 43169

You can use a negative lookahead:

^(?:(?!;).)+variable2

See a demo on regex101.com (and mind the multiline modifier!).

^           # matches the beginning of the line
(?:(?!;).)+ # match any character except a newline
            # and make sure what immediately follows
            # is not a semicolon 
variable2   # match variable2

Upvotes: 1

Sebastian Proske
Sebastian Proske

Reputation: 8413

You are getting a key point of lazy matching incorrect here: It doesn't try to find the overall shortest possible match, but it tries to find the shortest possible match from the beginning. Let's take a much shorter regex to show this: a*?b. Given a string aab, you are expecting the lazy match to match ab, but it matches aab.

The regex parser starts with the first character in the string (the first a) and matches it lazy. It then continues, but fails to match b as the second character is still an a. It then expands the a*? pattern to match aa and now can successfully match b, giving the overall match aab.

Upvotes: 1

gaganso
gaganso

Reputation: 3011

^[^;]*?variable2

This regex matches anything other than ; from the start of the line till variable2.Since line 2 and line 3(just a newline) also don't contain any ; they are matched starting from the beginning of the 2nd line till variable2. Since you are using multiline mode, ^ acts as an anchor for each line.

Demo

^[^;\n]*?variable2

This regex matches anything other than ; and \n from the start of the line till variable2. Line 2 and line 3 are not matched since they contain \n.

Demo

Upvotes: 2

Related Questions