user2101984
user2101984

Reputation: 67

Python regex split before regular expression

Here is the problem:

string = 'abcdefghijklmn opabcedfg'

desired_result = ['abcdefghijklmn op', 'abcedfg']

I am looking for "abc" by regular expression: re.compile(r"abc") and splitting thereafter on the basis of this regex. This gives: ['abc','defghijklmn op','abc','dfg']

Can I adjust my regex to reach the desired split?

Thanks!

Upvotes: 1

Views: 443

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You can use a regex similar to this one:

abc[^a]*(?:a(?!bc)[^a]*)*

See regex demo

It will collect all substrings starting with abc and up to the first abc met or the end of string.

Regex breakdown:

  • abc - match abc
  • [^a]* - match 0 or more characters other than a
  • (?:a(?!bc)[^a]*)* - match (but not capture) 0 or more sequences of
    • a(?!bc) - match a that is not followed with bc (as we are matching up to abc)
    • [^a]* - match 0 or more characters other than a

It is similar to what abc.*?(?=$|abc) would capture, but is free from the issues associated with lazy dot matching.

Python code demo:

p = re.compile(r'abc[^a]*(?:a(?!bc)[^a]*)*')
test_str = "abcdefghijklmn opabcedfg"
print(p.findall(test_str))

Results: ['abcdefghijklmn op', 'abcedfg']

Upvotes: 1

Related Questions