Reputation: 17
Need help with a regular expression to grab exactly n lines of text between two regex matches. For example, I need 17 lines of text and I used the example below, which does not work. I
Please see sample code below:
import re
match_string = re.search(r'^.*MDC_IDC_RAW_MARKER((.*?\r?\n){17})Stored_EGM_Trigger.*\n'), t, re.DOTALL).group()
value1 = re.search(r'value="(\d+)"', match_string).group(1)
value2 = re.search(r'value="(\d+\.\d+)"', match_string).group(1)
print(match_string)
print(value1)
print(value2)
I added a sample string to here, because SO does not allow long code string: https://hastebin.com/aqowusijuc.xml
Upvotes: 0
Views: 265
Reputation: 44043
You are getting false positives because you are using the re.DOTALL flag, which allows the .
character to match newline characters. That is, when you are matching ((.*?\r?\n){17})
, the .
could eat up many extra newline characters just to satisfy your required count of 17. You also now realize that the \r
is superfluous. Also, starting your regex with ^.*?
is superfluous because you are forcing the search to start from the beginning but then saying that the search engine should skip as many characters as necessary to find MDC_IDC_RAW_MARKER
. So, a simplified and correct regex would be:
match_string = re.search(r'MDC_IDC_RAW_MARKER.*\n((.*\n){17})Stored_EGM_Trigger.*\n', t)
Upvotes: 1