Reputation: 3
I am basically working to remove comments from a file(read) and write it to some file. The single line comments may be at the start of the line, or from in-between. The part from where the comment starts, to the next line, is to be removed.
Some answer suggested the below-mentioned code but it doesn't work for single line comments which are present after some useful code. I have some knowledge of lex, so I tried modifying the code to fix my need but I am stuck. Please Help.
import re
def stripComments(code):
code = str(code)
return re.sub(r'(?m)^ *#.*\n?', '', code)
print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))
Expected output:
Why so Serious?
bar foo
Actual output:
Why so Serious? #This comment doesn't get removed
bar foo
[newline]
[newline]
Upvotes: 0
Views: 566
Reputation:
You can use regex101.com to debug your regex and see what it's actually matching.
(?m)
changes the matching rules so that ^
matches the beginning of a line, rather than the beginning of the entire string
^ *
is matching the start of a line, followed by any number of space characters. (So hopefully there aren't any tabs!)
In plain English, your regex is matching only Python comments that come at the beginning of the line or after any number of spaces.
Other answers have already provided regexes to do what you want, so I won't repeat it here.
Upvotes: 0
Reputation: 3338
Your regex has an anchor '^'
which means the pattern can only start at the beginning of the line. Without this it pretty much works.
You may also want to compile the regex ahead of time so you can re-use it without compiling each time:
COMMENT_PATTERN = re.compile('\s*#.*\n?', re.MULTILINE)
def strip_comments(code):
return COMMENT_PATTERN.sub('', str(code))
I've also replaced the space ' '
with '\s'
which will match any white space like tabs etc. You should put that back if you don't like it.
Upvotes: 1
Reputation: 528
I think a basic exploration of your string could do the job better (and faster) than using re
, here's a working example :
def stripComments(code):
codeWithoutComments = ""
for i in code.splitlines():
marker = False
for j in i:
if j == "#":
marker = True
if not marker:
codeWithoutComments += j
codeWithoutComments += "\n"
return codeWithoutComments
print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))
returned value :
"""
Why so Serious?
bar foo
"""
Upvotes: 0
Reputation: 59731
Try with this:
import re
def stripComments(code):
code = str(code)
return re.sub(r'(#.*)?\n?', '', code)
print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))
# Why so Serious? bar foo
Upvotes: 2