Reputation: 4301
Denote a string:
string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
I want to extract the first three sentence, that is,
This is the first sentence.\nIt is the second one.\nNow, this is the third one.
Apparently, the following regular expression does not work:
re.search('(?<=This)(.*?)(?=\n)', string)
What is the correct expression for extracting text between This
and the third \n
?
Thanks.
Upvotes: 0
Views: 98
Reputation: 46759
Try the following:
import re
string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)
Giving you:
is the first sentence.
It is the second one.
Now, this is the third one.
This assumes there was a missing n
before Now
. If you wish to keep This
then you can move it inside the (
Upvotes: 0
Reputation: 18357
You can use this regex for capturing three sentences starting with This
text,
This(?:[^\n]*\n){3}
Edit:
Python code,
import re
s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
print(m.group())
Prints,
This is the first sentence.
It is the second one.
Now, this is the third one.
Upvotes: 1
Reputation: 696
(?s)(This.*?)(?=\nThis)
Make the .
include newline with (?s)
, look for a sequence starting with This
and followed by \nThis
.
Don't forget that __repr__
of the search result doesn't print the whole matched string, so you'll need to
print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])
Upvotes: 0
Reputation: 2981
Jerry's right, regex isn't the right tool for the job and there are much easier and more efficient ways of tackling the problem;
this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
print('\n'.join(this.split('\n', 3)[:-1]))
OUTPUT:
This is the first sentence.
It is the second one.
Now, this is the third one.
If you just want to practice using regex, following a tutorial would be much easier.
Upvotes: 0