Reputation: 454
I am trying to parse a list of data out of a file using python - however I don't want to extract any data that is commented out. An example of the way the data is structured is:
#commented out block
uncommented block
# commented block
I am trying to only retrieve the middle item, so am trying to exclude the items with hashes at the start. The issue is that some hashes are directly next to the commented items, and some arent, and the expression I currently have only works if items have been commented in the first example above -
(?<!#)(commented)
I tried adding \s+ to the negative lookahead but then I get a complaint that the expression does not have an obvious maximum length. Is there any way to do what I'm attempting to do?
Thanks in advance,
Dan
Upvotes: 3
Views: 2222
Reputation: 605
I had a similar use case to parse CI/YAML files. Figured out a simpler way is to remove the commented lines first using regex before searching/proceeding:
import re
text = ci_file.read()
# Remove commented lines first.
any_commented_line = '#.*\n'
text = re.sub(any_commented_line, '', text)
# Search for the target pattern.
match = re.search(PATTERN, text)
This simplified the logic in my case.
Upvotes: 0
Reputation: 319561
Why using regex? String methods would do just fine:
>>> s = """#commented out block
uncommented block
# commented block
""".splitlines()
>>> for line in s:
not line.lstrip().startswith('#')
False
True
False
Upvotes: 6
Reputation: 342333
>>> s = """#commented out block
... uncommented block
... # commented block
... """
>>> for i in s.splitlines():
... if not i.lstrip().startswith("#"):
... print i
...
uncommented block
Upvotes: 0
Reputation: 12824
As SilentGhost indicated, a regular expression isn't the best solution to this problem, but I thought I'd address the negative look behind.
You thought of doing this:
(?<!#\s+)(commented)
This doesn't work, because the look behind needs a finite length. You could do something like this:
(?<!#)(\s+commented)
This would match the lines you want, but of course, you'd have to strip the whitespace off the comment group. Again, string manipulation is better for what you're doing, but I wanted to show how negative look behind could work since you were asking.
Upvotes: 4