How to use REGEX with multiple filters

Question

There are three DAYs described by text variable:

text = """
DAY {
 foo 12 5 A
 foo 
 12345
}
DAY {
 day 1
 day 2
 file = "/Users/Shared/docs/doc.txt"
 day 3
 end of the month
}
DAY {
 01.03.2016 11:15
 01.03.2016 11:16
 01.03.2016 11:17
}"""

All three DAY definitions begin with the word DAY (at the beginning of line), then a space and a curly bracket. The end is indicated with the closing bracket always placed at the beginning of the line. So we can say the boundaries of each DAY is defined within the curly brackets {}.

Using regex I need to "find" the DAY that contains file = "/Users/Shared/docs/doc.txt" line inside of its boundary.

I started writing a regex expression:

string = """DAY { [A-Za-z0-9]+}"""

result = re.findall(string, text)

But the expression stops finding the text at the end of foo right before the white space character. How to modify the expression so it returns the second DAY that has file = "/Users/Shared/docs/doc.txt" in its body, so the result would look like:

DAY {
 day 1
 day 2
 file = "/Users/Shared/docs/doc.txt"
 day 3
 end of the month
}

TurtleIzzy · Accepted Answer

To perform regular expression matching on multiline text, you need to compile your regex with parameter re.MULTILINE.

This piece of code should work as you requested.

regex = re.compile("""(DAY\s*\{[^\{\}]*file\ \=\ "/Users/Shared/docs/doc\.txt"[^\{\}]*\})""", re.MULTILINE)
regex.findall(text)

Result:

['DAY {
 day 1
 day 2
 file = "/Users/Shared/docs/doc.txt"
 day 3
 end of the month
}']

How to use REGEX with multiple filters

Answers (1)

Related Questions