Reputation: 8894
I'm having problems with this regex. I want to pull out just MATCH3
, because the others, MATCH1
and MATCH2
are commented out.
# url(r'^MATCH1/$',),
#url(r'^MATCH2$',),
url(r'^MATCH3$',), # comment
The regex I have captures all of the MATCH's.
(?<=url\(r'\^)(.*?)(?=\$',)
How do I ignore lines beginning with a comment? With a negative lookahead? Note the #
character is not necessarily at the start of the line.
EDIT: sorry, all answers are good! the example forgot a comma after the $'
at the end of the match group.
Upvotes: 0
Views: 233
Reputation: 70732
You really don't need to use lookarounds here, you could look for possible leading whitespace and then match "url" and the preceding context; capturing the part you want to retain.
>>> import re
>>> s = """# url(r'^MATCH1/$',),
#url(r'^MATCH2$',),
url(r'^MATCH3$',), # comment"""
>>> re.findall(r"(?m)^\s*url\(r'\^([^$]+)", s)
['MATCH3']
Upvotes: 1
Reputation: 107337
As an alternative you can split your lines with '#' if the first element has 'url' in (it doesn't start with # ) you can use re.search
to match the sub-string that you want :
>>> [re.search(r"url\(r'\^(.*?)\$'" ,i[0]).group(1) for i in [line.split('#') for line in s.split('\n')] if 'url' in i[0]]
['MATCH3']
Also note that you dont need to sue look-around for your pattern you can just use grouping!
Upvotes: 1
Reputation: 133978
If this is the only place where you need to match, then match beginning of line followed by optional whitespace followed by url
:
(?m)^\s*url\(r'(.*?)'\)
If you need to cover more complicated cases, I'd suggest using ast.parse
instead, as it truly understands the Python source code parsing rules.
import ast
tree = ast.parse("""(
# url(r'^MATCH1/$'),
#url(r'^MATCH2$'),
url(r'^MATCH3$') # comment
)""")
class UrlCallVisitor(ast.NodeVisitor):
def visit_Call(self, node):
if getattr(node.func, 'id', None) == 'url':
if node.args and isinstance(node.args[0], ast.Str):
print(node.args[0].s.strip('$^'))
self.generic_visit(node)
UrlCallVisitor().visit(tree)
prints each first string literal argument given to function named url
; in this case, it prints MATCH3
. Notice that the source for ast.parse
needs to be a well-formed Python source code (thus the parenthesis, otherwise a SyntaxError
is raised).
Upvotes: 1
Reputation: 67978
^\s*#.*$|(?<=url\(r'\^)(.*?)(?=\$'\))
Try this.Grab the capture.See demo.
https://www.regex101.com/r/rK5lU1/37
import re
p = re.compile(r'^\s*#.*$|(?<=url\(r\'\^)(.*?)(?=\$\'\))', re.IGNORECASE | re.MULTILINE)
test_str = "# url(r'^MATCH1/$'),\n #url(r'^MATCH2$'),\n url(r'^MATCH3$') # comment"
re.findall(p, test_str)
Upvotes: 1