Reputation: 203
I am new to regex in python and am trying to replace a substring inside a string. My substring starts with a specific word and ends with two new line characters.
Below is what I tried
import re
a=re.sub(r'Report from.+', r' ', 'To: [email protected]; Report from xxxxx \n Category\t Score\t \n xxxxxxxxxxx xxxxxxxxt \n xxxxxxx\t xxxxxxx\t \n\n original message\n')
Output:
To: [email protected];
Category Score
xxxxxxxxxxx xxxxxxxxt
xxxxxxx xxxxxxx
original message
Expected Output:
To: [email protected];
original message
I also tried:
re.sub(r'Report from.+\n', r' ', 'To: [email protected]; Report from xxxxx \n Category\t Score\t \n xxxxxxxxxxx xxxxxxxxt \n xxxxxxx\t xxxxxxx\t \n\n original message\n')
but it wasn't even matching "Report from" literal.
I think I am half-way there. Can anyone please help?
Edit: I want to replace everything that starts with "Report from" all the way until the first occurrence of two new-line characters
Upvotes: 0
Views: 343
Reputation: 158977
Consider writing a simple state machine to do this. You have two states: you are looking for the first line in a block, or you are in a block and looking for the blank line. ("Two consecutive newlines" is the same as "I see a blank line when I read through the file line by line".)
import enum from Enum, auto
class LookFor(Enum):
REPORT = auto()
BLANK = auto()
state = LookFor.REPORT
with open(filename, 'r') as f:
for line in f:
if state == LookFor.REPORT:
print(line, end='')
if line.startswith('Report from'):
state = LookFor.BLANK
elif state == LookFor.BLANK:
if line == '\n':
print(line, end='')
state = LookFor.TO
The specific code I've written makes some assumptions about what you're looking for, and in particular that you can iterate through it line-by-line; you could adapt this to make more complex decisions about what state to switch to or add additional states as suited your application.
Upvotes: 1
Reputation: 24221
You want to use ?
to mark the 'end' of the substring you wish to replace.
import re
text = 'To: [email protected]; Report from xxxxx \n Category\t Score\t \n xxxxxxxxxxx xxxxxxxxt \n xxxxxxx\t xxxxxxx\t \n\n original message\n'
a=re.sub(r'Report from.+?\n\n', r'\n', text, flags=re.DOTALL)
print(a)
To: [email protected];
original message
Upvotes: 1