Adarsh Kumar
Adarsh Kumar

Reputation: 3

Remove '#' comments from a string (the comment may start from in-between ta line of the string)

I am basically working to remove comments from a file(read) and write it to some file. The single line comments may be at the start of the line, or from in-between. The part from where the comment starts, to the next line, is to be removed.

Some answer suggested the below-mentioned code but it doesn't work for single line comments which are present after some useful code. I have some knowledge of lex, so I tried modifying the code to fix my need but I am stuck. Please Help.

import re
def stripComments(code):
    code = str(code)
    return re.sub(r'(?m)^ *#.*\n?', '', code)

print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))

Expected output:

Why so Serious?

bar foo

Actual output:

Why so Serious? #This comment doesn't get removed

bar foo

[newline]

[newline]

Upvotes: 0

Views: 566

Answers (4)

user2201041
user2201041

Reputation:

You can use regex101.com to debug your regex and see what it's actually matching.

(?m) changes the matching rules so that ^ matches the beginning of a line, rather than the beginning of the entire string

^ * is matching the start of a line, followed by any number of space characters. (So hopefully there aren't any tabs!)

In plain English, your regex is matching only Python comments that come at the beginning of the line or after any number of spaces.

Other answers have already provided regexes to do what you want, so I won't repeat it here.

Upvotes: 0

Jon Betts
Jon Betts

Reputation: 3338

Your regex has an anchor '^' which means the pattern can only start at the beginning of the line. Without this it pretty much works.

You may also want to compile the regex ahead of time so you can re-use it without compiling each time:

COMMENT_PATTERN = re.compile('\s*#.*\n?', re.MULTILINE)


def strip_comments(code):
    return COMMENT_PATTERN.sub('', str(code))

I've also replaced the space ' ' with '\s' which will match any white space like tabs etc. You should put that back if you don't like it.

Upvotes: 1

Nenri
Nenri

Reputation: 528

I think a basic exploration of your string could do the job better (and faster) than using re , here's a working example :

def stripComments(code):
    codeWithoutComments = ""
    for i in code.splitlines():
        marker = False
        for j in i:
            if j == "#":
                marker = True
            if not marker:
                codeWithoutComments += j
        codeWithoutComments += "\n"
    return codeWithoutComments

print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))

returned value :

"""
Why so Serious?
bar foo

"""

Upvotes: 0

javidcf
javidcf

Reputation: 59731

Try with this:

import re
def stripComments(code):
    code = str(code)
    return re.sub(r'(#.*)?\n?', '', code)

print(stripComments("""#foo bar
Why so Serious? #This comment doesn't get removed
bar foo
# buz"""))
# Why so Serious? bar foo

Upvotes: 2

Related Questions