Reputation: 3594
I thought I was ok with regex - but this has me confused - I have this line in python:
dependencies = re.findall( r"-- *depends *on *([^ ]*.*[^ ]) *$", script, re.MULTILINE)
which works really well with:
"-- depends on b " -> ["b"]
"-- depends on b" -> ["b"]
"--dependson green things \n-- depends on red things\nother stuff"" -> ["green things", "red things"]
"-- depends on b \n-- depends on c" -> ["b", "c"]
but doesn't work on
"-- depends on b\n-- depends on c" -> ["b\n-- depends on c"]
I get that it's going to be some weirdness about the fact that $ matches before the newline - but what I don't get is how to fix the regex?
Upvotes: 1
Views: 62
Reputation: 626738
In Python re
, re.MULTILINE
option only redefines the behavior of two anchors, ^
and $
, that start matching start and end of any line, not just the whole string:
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string. Corresponds to the inline flag
(?m)
.
Next, the [^ ]
negated character class matches any char other than a literal regular space char (\x20
, dec. code 32). Thus, [^ ]*
matches any zero or more chars other than a space (including a newline, too).
You can use
-- *depends *on *(.*\S) *$
Or, if you can have non-breaking spaces or other horizontal Unicode spaces
--[^\S\r\n]*depends[^\S\r\n]*on[^\S\r\n]*(.*\S)[^\S\r\n]*$
In Python, you can use
h = r'[^\S\r\n]'
pattern = fr'--{h}*depends{h}*on{h}*(.*\S){h}*$'
The {h}*(.*\S)
part does the job: zero or more spaces are matched and consumed first, then any zero or more chars other than line break chars as many as possible (.*
) + a non-whitespace char (\S
) are captured into Group 1.
Upvotes: 1
Reputation: 122
It's matching the "\n" newline as "not a space" you can fix it like so for this example:
-- *depends *on *([^ \n]*.*[^ \n]) *$
You probably really wanted something like:
--\s*depends\s*on\s*(\S*.*\S)\s*$
\s
means "any space type" and \S
means any NOT space type.
Upvotes: 0