Reputation: 67
Here is the problem:
string = 'abcdefghijklmn opabcedfg'
desired_result = ['abcdefghijklmn op', 'abcedfg']
I am looking for "abc" by regular expression: re.compile(r"abc") and splitting thereafter on the basis of this regex. This gives: ['abc','defghijklmn op','abc','dfg']
Can I adjust my regex to reach the desired split?
Thanks!
Upvotes: 1
Views: 443
Reputation: 626794
You can use a regex similar to this one:
abc[^a]*(?:a(?!bc)[^a]*)*
See regex demo
It will collect all substrings starting with abc
and up to the first abc
met or the end of string.
Regex breakdown:
abc
- match abc
[^a]*
- match 0 or more characters other than a
(?:a(?!bc)[^a]*)*
- match (but not capture) 0 or more sequences of
a(?!bc)
- match a
that is not followed with bc
(as we are matching up to abc
)[^a]*
- match 0 or more characters other than a
It is similar to what abc.*?(?=$|abc)
would capture, but is free from the issues associated with lazy dot matching.
p = re.compile(r'abc[^a]*(?:a(?!bc)[^a]*)*')
test_str = "abcdefghijklmn opabcedfg"
print(p.findall(test_str))
Results: ['abcdefghijklmn op', 'abcedfg']
Upvotes: 1