Reputation: 4160
I would like to transform the following text:
some text
% comment line 1
% comment line 2
% comment line 3
some more text
into
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
AND in the same file, when there is only one line commented, I would like it to go from
some text
% a single commented line
some more text
to
some text
# a single commented line
some more text
So, when the two cases are in the same file, I would like to go from:
some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text
to
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text
# a single commented line
some more text
What I tried so far, for the second case works as:
re.sub(r'(\A|\r|\n|\r\n|^)% ', r'\1# ', 'some text \n% a single comment line\nsome more text')
but it replaces %
into #
also when there is more than one line commented.
As for the second case I have failed with:
re.sub(r'(\A|\r|\n|\r\n|^)(% )(.*)(?:\n^\t.*)*', r'"""\3"""', 'some text \n% comment line1\n% comment line 2\n% comment line 3\nsome more text')
which repeats the """
at each line and conflicts with the case when only one line is commented.
Is there any way to count the consecutive lines where a regular expression is found and change pattern accordingly?
Thanks in advance for the help!
Upvotes: 1
Views: 314
Reputation: 92854
Straightforwardly:
with open('input.txt') as f:
comments = []
def reformat_comments(comments):
if len(comments) == 1:
comments_str = '#' + comments[0] + '\n'
else:
comments_str = '"""\n{}\n"""\n'.format('\n'.join(comments))
return comments_str
for line in f:
line = line.strip()
if line.startswith('% '):
comments.append(line.lstrip('%'))
elif comments:
print(reformat_comments(comments) + line)
comments = []
else:
print(line)
if comments: print(reformat_comments(comments))
Sample output:
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text
# a single commented line
some more text
Upvotes: 1
Reputation: 82899
While this is probably possible with a regular expression, I think this is much easier without one. You could e.g. use itertools.groupby
to detect groups of consecutive commented lines, simply using str.startswith
to check whether a line is a comment.
text = """some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text"""
import itertools
for k, grp in itertools.groupby(text.splitlines(), key=lambda s: s.startswith("%")):
if not k:
for s in grp:
print(s)
else:
grp = list(grp)
if len(grp) == 1:
print("# " + grp[0].lstrip("% "))
else:
print('"""')
for s in grp:
print(s.lstrip("% "))
print('"""')
This just prints the resulting text, but you can of course also collect it in some string variable and return it. If comments can also start in the middle of a line, you can check this in the if not k
block. Here it would make sense to use re.sub
to e.g. differentiate between %
and \%
.
Upvotes: 2