Reputation: 2401
I have the following pattern:
1MHG161 xxxxxxxxxxxxx 1MHG161 xxx
where xxxx
is variable length of chars & spaces.
I am trying to capture each one and have the following expected output:
[ '1MHG161 xxxxxxxxxxxxx ' , '1MHG161 xxx' ]
I have tried a lot of combination this is the last one
messages_strings = re.findall("(1MHG161.+?)(?=1MHG161)",content)
This finds all except the last one.
I have taken @anubhava answer, a little bit further to solve the same problem but with dynamic delimiters by using \d[A-Z]{3}\d{3}
instead of 1MHG161
This may help people working with EDI parsers.
Upvotes: 1
Views: 126
Reputation: 784958
You can use:
>>> re.findall(r"(1MHG161.+?)(?=1MHG161|$)", content)
['1MHG161 xxxxxxxxxxxxx ', '1MHG161 xxx']
Lookahead (?=1MHG161|$)
will match 1MHG161
or end of line anchor $
after your match.
Upvotes: 3