Ganga
Ganga

Reputation: 923

Python 2: Regex to get text anywhere between two strings

I am trying to find a regex to get the text between Explanation One: and Explanation Two:

Trick is that text may or may not exist, it could be in the same line as Explanation One or it could be in next line of Explanation One. Current regex in the below code, adds an additional line after it finds the text before Explanation Two:

Any pointers appreciated to just get the text ignoring additional empty lines.

import re

STRING="""Explanation One:
Blah Blah

Explanation Two: ndnlnlkn
"""

pattern = r'Explanation One:[\r\n ].*(?=Explanation Two:)+')'
regex = re.compile(pattern, re.IGNORECASE)
print regex.search(STRING).group()

Output:

Explanation One: 
Blah Blah

Upvotes: 1

Views: 152

Answers (2)

The fourth bird
The fourth bird

Reputation: 163297

To match the text between Explanation One: and Explanation Two: you could capture it in a group using the DOTALL flag or use an inline modifier (?s) to make the dot match a newline.

Explanation One:\s*(.*?)\s*Explanation Two

Explanation

  • Explanation One: Match literally
  • \s* Match zero or times a whitespace character
  • (.*?) Capture in a group any character zero or more time non greedy
  • \s* Match zero or times a whitespace character
  • Explanation Two Match literally

Regex demo

Demo Python

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521194

The problem with your current approach is that mode in which you are performing your regex is not DOT ALL mode. This means that .* will not match across lines, which is precisely what you want it to do, until reaching the Explanation Two: marker text. One way around this is to match the following:

[\s\S]*

This will match anything, whitespace or non whitespace, meaning it will match everything even across lines.

pattern = r'Explanation One:([\s\S]*)(?=Explanation Two:)'
searchObj = re.search(pattern, STRING, re.M|re.I)
print searchObj.group(1)

Blah Blah

Demo

By the way, an alternative would be to leave your current pattern as is, and add the re.DOTALL flag to re.search call. So the following should also work:

pattern = r'Explanation One:(.*)(?=Explanation Two:)'
searchObj = re.search(pattern, STRING, re.M|re.I|re.DOTALL)
print searchObj.group(1)

Upvotes: 1

Related Questions