Dan
Dan

Reputation: 454

Negative lookbehind in Python regular expressions

I am trying to parse a list of data out of a file using python - however I don't want to extract any data that is commented out. An example of the way the data is structured is:

#commented out block
uncommented block
#   commented block

I am trying to only retrieve the middle item, so am trying to exclude the items with hashes at the start. The issue is that some hashes are directly next to the commented items, and some arent, and the expression I currently have only works if items have been commented in the first example above -

(?<!#)(commented)

I tried adding \s+ to the negative lookahead but then I get a complaint that the expression does not have an obvious maximum length. Is there any way to do what I'm attempting to do?

Thanks in advance,

Dan

Upvotes: 3

Views: 2222

Answers (4)

Ranel Padon
Ranel Padon

Reputation: 605

I had a similar use case to parse CI/YAML files. Figured out a simpler way is to remove the commented lines first using regex before searching/proceeding:

import re

text = ci_file.read()

# Remove commented lines first.
any_commented_line = '#.*\n'
text = re.sub(any_commented_line, '', text)
    
# Search for the target pattern.
match = re.search(PATTERN, text)

This simplified the logic in my case.

Upvotes: 0

SilentGhost
SilentGhost

Reputation: 319561

Why using regex? String methods would do just fine:

>>> s = """#commented out block
uncommented block
#   commented block
""".splitlines()
>>> for line in s:
    not line.lstrip().startswith('#')


False
True
False

Upvotes: 6

ghostdog74
ghostdog74

Reputation: 342333

>>> s = """#commented out block
... uncommented block
...    #   commented block
... """
>>> for i in s.splitlines():
...    if not i.lstrip().startswith("#"):
...       print i
...
uncommented block

Upvotes: 0

JoshD
JoshD

Reputation: 12824

As SilentGhost indicated, a regular expression isn't the best solution to this problem, but I thought I'd address the negative look behind.

You thought of doing this:

(?<!#\s+)(commented)

This doesn't work, because the look behind needs a finite length. You could do something like this:

(?<!#)(\s+commented)

This would match the lines you want, but of course, you'd have to strip the whitespace off the comment group. Again, string manipulation is better for what you're doing, but I wanted to show how negative look behind could work since you were asking.

Upvotes: 4

Related Questions