Reputation: 33
I am searching through text for lines which use "variable2" without a semicolon before it in the line. Here is my regular expression to solve this.
^[^;]*?variable2
My understanding of this is that it should find text starting with a newline which minimizes the number of non semicolon characters followed by "variable2". This fails to select what I expect in this example.
Label0: mov variable0,WREG ;Some comment
mov W0,variable1
Label1: btsc variable2,#1 ;Some other comment
bra label2
I expected to get this
Label1: btsc variable2
but selected this instead
mov W0,variable1
Label1: btsc variable2
What am I misunderstanding? It seems to me the non-greedy expression is not doing what I intended it to do. If I change my regular expression to ^[^;\n]*?variable2
, it selects what I expect it to select. I am using Sublime Text 2 for my regular expressions, but I seem to get the same results in php, javascript, and python (according to regex101.com).
Upvotes: 3
Views: 1237
Reputation: 43169
You can use a negative lookahead:
^(?:(?!;).)+variable2
See a demo on regex101.com (and mind the multiline
modifier!).
^ # matches the beginning of the line
(?:(?!;).)+ # match any character except a newline
# and make sure what immediately follows
# is not a semicolon
variable2 # match variable2
Upvotes: 1
Reputation: 8413
You are getting a key point of lazy matching incorrect here: It doesn't try to find the overall shortest possible match, but it tries to find the shortest possible match from the beginning. Let's take a much shorter regex to show this: a*?b
. Given a string aab
, you are expecting the lazy match to match ab
, but it matches aab
.
The regex parser starts with the first character in the string (the first a
) and matches it lazy. It then continues, but fails to match b
as the second character is still an a
. It then expands the a*?
pattern to match aa
and now can successfully match b
, giving the overall match aab
.
Upvotes: 1
Reputation: 3011
^[^;]*?variable2
This regex matches anything other than ;
from the start of the line till variable2
.Since line 2 and line 3(just a newline) also don't contain any ;
they are matched starting from the beginning of the 2nd line till variable2
. Since you are using multiline mode, ^
acts as an anchor for each line.
^[^;\n]*?variable2
This regex matches anything other than ;
and \n
from the start of the line till variable2. Line 2 and line 3 are not matched since they contain \n
.
Upvotes: 2