Reputation: 1889
I have a String Alltext that contains text across multiple lines
aaaaa
D0
aaaaa
text0...........
aaaaa
D1
aaaaa
text 1 ..........
aaaaa
D2
aaaaa
text 2
I want to keep just the text part i.e. text0...., text1 , text2.... and remove the indicators
aaaaa
D0
aaaaa,
aaaaa
D1
aaaaa
and so on.These indicate next text segment. I tried this regular expression
re.sub("[a]* \sD[0-9]*\\s[a] * ", " ",Alltext)
but this just removes D0, D1 and not the aaaa The output I get
aaaaa
aaaaa
text0
aaaaa
aaaaa
text1
How can I remove these aaaaa
Upvotes: 1
Views: 108
Reputation: 67988
print re.findall(r"^text.*$",x,re.M)
Simle findall should do this as well.
Upvotes: 1
Reputation: 174844
You don't need to put a single character inside character class and also you don't need to double escape \s
a*\s*D[0-9]*\s*a*\s*
Python code would be,
>>> import re
>>> s = """aaaaa
D0
aaaaa
text0...........
aaaaa
D1
aaaaa
text 1 ..........
aaaaa
D2
aaaaa
text 2 """
>>> m = re.sub(r'a*\s*D[0-9]*\s*a*\s*', r'', s)
>>> m
'text0...........\n\n\ntext 1 ..........\n\n\ntext 2 '
>>> print m
text0...........
text 1 ..........
text 2
Upvotes: 1