SeF
SeF

Reputation: 4160

Replace comments with comments and block comments according to the number of lines commented, with regular expression in python

I would like to transform the following text:

some text
% comment line 1
% comment line 2
% comment line 3
some more text

into

some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text

AND in the same file, when there is only one line commented, I would like it to go from

some text
% a single commented line
some more text

to

some text 
# a single commented line
some more text

So, when the two cases are in the same file, I would like to go from:

some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text

to

some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text 
# a single commented line
some more text

What I tried so far, for the second case works as:

re.sub(r'(\A|\r|\n|\r\n|^)% ', r'\1# ',  'some text \n% a single comment line\nsome more text')

but it replaces % into # also when there is more than one line commented.

As for the second case I have failed with:

re.sub(r'(\A|\r|\n|\r\n|^)(% )(.*)(?:\n^\t.*)*', r'"""\3"""',  'some text \n% comment line1\n% comment line 2\n% comment line 3\nsome more text') 

which repeats the """ at each line and conflicts with the case when only one line is commented.

Is there any way to count the consecutive lines where a regular expression is found and change pattern accordingly?

Thanks in advance for the help!

Upvotes: 1

Views: 314

Answers (2)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Straightforwardly:

with open('input.txt') as f:
    comments = []

    def reformat_comments(comments):
        if len(comments) == 1:
            comments_str = '#' + comments[0] + '\n'
        else:
            comments_str = '"""\n{}\n"""\n'.format('\n'.join(comments))
        return comments_str

    for line in f:
        line = line.strip()
        if line.startswith('% '):
            comments.append(line.lstrip('%'))
        elif comments:
            print(reformat_comments(comments) + line)
            comments = []
        else:
            print(line)
    if comments: print(reformat_comments(comments))

Sample output:

some text
"""
 comment line 1
 comment line 2
 comment line 3
"""
some more text
some text
# a single commented line
some more text

Upvotes: 1

tobias_k
tobias_k

Reputation: 82899

While this is probably possible with a regular expression, I think this is much easier without one. You could e.g. use itertools.groupby to detect groups of consecutive commented lines, simply using str.startswith to check whether a line is a comment.

text = """some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text"""

import itertools
for k, grp in itertools.groupby(text.splitlines(), key=lambda s: s.startswith("%")):
    if not k:
        for s in grp:
            print(s)
    else:
        grp = list(grp)
        if len(grp) == 1:
            print("# " + grp[0].lstrip("% "))
        else:
            print('"""')
            for s in grp:
                print(s.lstrip("% "))
            print('"""')

This just prints the resulting text, but you can of course also collect it in some string variable and return it. If comments can also start in the middle of a line, you can check this in the if not k block. Here it would make sense to use re.sub to e.g. differentiate between % and \%.

Upvotes: 2

Related Questions