Reputation: 3426
I have a large, multi-line string with multiple entries following a similar format. I'd like to split it into a list of strings for each entry.
I tried the following:
myre = re.compile('Record\sTime.*-{5}', re.DOTALL)
return re.findall(myre, text)
In this case, entries start with 'Record Time', and end with '-----'. Instead of acting how I'd like, the code above returns one item, starting at beginning of the first entry, and ending at the end of the last one.
I could probably find a way to make this work by using regex to find the end of a segment, then repeat with a slice of the original text starting there, but that seems messy.
Upvotes: 2
Views: 2086
Reputation: 104092
Something like this:
txt='''\
Record Time
1
2
3
-----
Record Time
4
5
-----
Record Time
6
7
8
'''
import re
pat=re.compile(r'^Record Time$(.*?)(?:^-{5}|\Z)', re.S | re.M)
for i, block in enumerate((m.group(1) for m in pat.finditer(txt))):
print 'block:', i
print block.strip()
Prints:
block: 0
1
2
3
block: 1
4
5
block: 2
6
7
8
Upvotes: 1
Reputation: 89629
You can use this to avoid a reluctant quantifier, it's a trick to emulate an atomic group: (?=(...))\1
. It's not totally in the subject but it can be usefull:
myre = re.compile('Record\sTime(?:(?=([^-]+|-(?!-{4})))\1)+-{5}')
Upvotes: 1
Reputation: 500883
You need to turn the .*
into a reluctant match, by adding a question mark:
.*?
Otherwise it matches as much as it can, from the middle of the first record to the middle of the last record.
See Greedy vs. Reluctant vs. Possessive Quantifiers
Upvotes: 5