ehacinom
ehacinom

Reputation: 8894

Regex negative lookahead ignoring comments

I'm having problems with this regex. I want to pull out just MATCH3, because the others, MATCH1 and MATCH2 are commented out.

#   url(r'^MATCH1/$',),
   #url(r'^MATCH2$',),
    url(r'^MATCH3$',), # comment

The regex I have captures all of the MATCH's.

(?<=url\(r'\^)(.*?)(?=\$',)

How do I ignore lines beginning with a comment? With a negative lookahead? Note the # character is not necessarily at the start of the line.

EDIT: sorry, all answers are good! the example forgot a comma after the $' at the end of the match group.

Upvotes: 0

Views: 233

Answers (4)

hwnd
hwnd

Reputation: 70732

You really don't need to use lookarounds here, you could look for possible leading whitespace and then match "url" and the preceding context; capturing the part you want to retain.

>>> import re
>>> s = """#   url(r'^MATCH1/$',),
   #url(r'^MATCH2$',),
    url(r'^MATCH3$',), # comment"""
>>> re.findall(r"(?m)^\s*url\(r'\^([^$]+)", s)
['MATCH3']

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107337

As an alternative you can split your lines with '#' if the first element has 'url' in (it doesn't start with # ) you can use re.search to match the sub-string that you want :

>>> [re.search(r"url\(r'\^(.*?)\$'" ,i[0]).group(1) for i in [line.split('#') for line in s.split('\n')] if 'url' in i[0]]
['MATCH3']

Also note that you dont need to sue look-around for your pattern you can just use grouping!

Upvotes: 1

If this is the only place where you need to match, then match beginning of line followed by optional whitespace followed by url:

(?m)^\s*url\(r'(.*?)'\)

If you need to cover more complicated cases, I'd suggest using ast.parse instead, as it truly understands the Python source code parsing rules.

import ast

tree = ast.parse("""(
#   url(r'^MATCH1/$'),
   #url(r'^MATCH2$'),
    url(r'^MATCH3$') # comment
)""")

class UrlCallVisitor(ast.NodeVisitor):
    def visit_Call(self, node):
        if getattr(node.func, 'id', None) == 'url':
            if node.args and isinstance(node.args[0], ast.Str):
                print(node.args[0].s.strip('$^'))

        self.generic_visit(node)

UrlCallVisitor().visit(tree)

prints each first string literal argument given to function named url; in this case, it prints MATCH3. Notice that the source for ast.parse needs to be a well-formed Python source code (thus the parenthesis, otherwise a SyntaxError is raised).

Upvotes: 1

vks
vks

Reputation: 67978

^\s*#.*$|(?<=url\(r'\^)(.*?)(?=\$'\))

Try this.Grab the capture.See demo.

https://www.regex101.com/r/rK5lU1/37

import re
p = re.compile(r'^\s*#.*$|(?<=url\(r\'\^)(.*?)(?=\$\'\))', re.IGNORECASE | re.MULTILINE)
test_str = "# url(r'^MATCH1/$'),\n #url(r'^MATCH2$'),\n url(r'^MATCH3$') # comment"

re.findall(p, test_str)

Upvotes: 1

Related Questions