Chan
Chan

Reputation: 4301

How to get sub-string between two repetitive keywords in Python

Denote a string:

 string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

I want to extract the first three sentence, that is,

This is the first sentence.\nIt is the second one.\nNow, this is the third one.

Apparently, the following regular expression does not work:

re.search('(?<=This)(.*?)(?=\n)', string)

What is the correct expression for extracting text between This and the third \n?

Thanks.

Upvotes: 0

Views: 98

Answers (4)

Martin Evans
Martin Evans

Reputation: 46759

Try the following:

import re

string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)

Giving you:

 is the first sentence.
It is the second one.
Now, this is the third one.

This assumes there was a missing n before Now. If you wish to keep This then you can move it inside the (

Upvotes: 0

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

You can use this regex for capturing three sentences starting with This text,

This(?:[^\n]*\n){3}

Demo

Edit:

Python code,

import re

s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
 print(m.group())

Prints,

This is the first sentence.
It is the second one.
Now, this is the third one.

Upvotes: 1

WofWca
WofWca

Reputation: 696

(?s)(This.*?)(?=\nThis)

Make the . include newline with (?s), look for a sequence starting with This and followed by \nThis.

Don't forget that __repr__ of the search result doesn't print the whole matched string, so you'll need to

print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])

Upvotes: 0

Nordle
Nordle

Reputation: 2981

Jerry's right, regex isn't the right tool for the job and there are much easier and more efficient ways of tackling the problem;

this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

print('\n'.join(this.split('\n', 3)[:-1]))

OUTPUT:

This is the first sentence.

It is the second one.

Now, this is the third one.

If you just want to practice using regex, following a tutorial would be much easier.

Upvotes: 0

Related Questions