kkumar
kkumar

Reputation: 203

Replace a string starting with a word until two new line characters

I am new to regex in python and am trying to replace a substring inside a string. My substring starts with a specific word and ends with two new line characters.

Below is what I tried

import re
a=re.sub(r'Report from.+', r' ', 'To: [email protected];     Report from xxxxx \n     Category\t Score\t  \n xxxxxxxxxxx xxxxxxxxt  \n xxxxxxx\t xxxxxxx\t \n\n original message\n')

Output:

To: [email protected];      
     Category    Score    
 xxxxxxxxxxx xxxxxxxxt  
 xxxxxxx     xxxxxxx     

 original message

Expected Output:

To: [email protected];      
 original message

I also tried:

re.sub(r'Report from.+\n', r' ', 'To: [email protected];     Report from xxxxx \n     Category\t Score\t  \n xxxxxxxxxxx xxxxxxxxt  \n xxxxxxx\t xxxxxxx\t \n\n original message\n')

but it wasn't even matching "Report from" literal.

I think I am half-way there. Can anyone please help?

Edit: I want to replace everything that starts with "Report from" all the way until the first occurrence of two new-line characters

Upvotes: 0

Views: 343

Answers (2)

David Maze
David Maze

Reputation: 158977

Consider writing a simple state machine to do this. You have two states: you are looking for the first line in a block, or you are in a block and looking for the blank line. ("Two consecutive newlines" is the same as "I see a blank line when I read through the file line by line".)

import enum from Enum, auto

class LookFor(Enum):
  REPORT = auto()
  BLANK = auto()

state = LookFor.REPORT
with open(filename, 'r') as f:
  for line in f:
    if state == LookFor.REPORT:
      print(line, end='')
      if line.startswith('Report from'):
        state = LookFor.BLANK
    elif state == LookFor.BLANK:
      if line == '\n':
        print(line, end='')
        state = LookFor.TO

The specific code I've written makes some assumptions about what you're looking for, and in particular that you can iterate through it line-by-line; you could adapt this to make more complex decisions about what state to switch to or add additional states as suited your application.

Upvotes: 1

iacob
iacob

Reputation: 24221

You want to use ? to mark the 'end' of the substring you wish to replace.

import re

text = 'To: [email protected];     Report from xxxxx \n     Category\t Score\t  \n xxxxxxxxxxx xxxxxxxxt  \n xxxxxxx\t xxxxxxx\t \n\n original message\n'

a=re.sub(r'Report from.+?\n\n', r'\n', text, flags=re.DOTALL)

print(a)

To: [email protected];     
 original message

Upvotes: 1

Related Questions