AMisra
AMisra

Reputation: 1889

regular expression over multiple lines python

I have a String Alltext that contains text across multiple lines

aaaaa    
D0  
aaaaa

text0...........


aaaaa                                      
D1  
aaaaa  
text 1 ..........


aaaaa  
D2  
aaaaa  
text 2    

I want to keep just the text part i.e. text0...., text1 , text2.... and remove the indicators

aaaaa
D0
aaaaa, 

aaaaa
D1
aaaaa

and so on.These indicate next text segment. I tried this regular expression

re.sub("[a]* \sD[0-9]*\\s[a] * ", " ",Alltext)

but this just removes D0, D1 and not the aaaa The output I get

aaaaa  
aaaaa   
text0  
aaaaa       
aaaaa  
text1 

How can I remove these aaaaa

Upvotes: 1

Views: 108

Answers (2)

vks
vks

Reputation: 67988

 print re.findall(r"^text.*$",x,re.M)

Simle findall should do this as well.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174844

You don't need to put a single character inside character class and also you don't need to double escape \s

a*\s*D[0-9]*\s*a*\s*

DEMO

Python code would be,

>>> import re
>>> s = """aaaaa    
D0  
aaaaa

text0...........


aaaaa                                      
D1  
aaaaa  
text 1 ..........


aaaaa  
D2  
aaaaa  
text 2  """
>>> m = re.sub(r'a*\s*D[0-9]*\s*a*\s*', r'', s)
>>> m
'text0...........\n\n\ntext 1 ..........\n\n\ntext 2  '
>>> print m
text0...........


text 1 ..........


text 2

Upvotes: 1

Related Questions