dpdenton
dpdenton

Reputation: 310

Split string on portion of matched regular expression (python)

Say I have a string 'ad>ad>ad>>ad' and I want to split on this on the '>' (not the '>>' chars). Just picked up regex and was wondering if there is a way (special character) to split on a specific part of the matched expression, rather than splitting on the whole matched expression, for example the regex could be:

re.split('[^>]>[^>]', 'ad>ad>ad>>ad')

Can you get it to split on the char in parenthesis [^>](>)[^>] ?

Upvotes: 3

Views: 222

Answers (2)

user2705585
user2705585

Reputation:

Try with \b>\b

This will check for single > surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \b is simplest method.

Regex101 Demo

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

You need to use lookarounds:

re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')

See the regex demo

The (?<!>)>(?!>) pattern only matches a > that is not preceded with a < (due to the negative lookbehind (?<!>)) and that is not followed with a < (due to the negative lookahead (?!>)).

Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>]), we only match and split on a < symbol without "touching" the symbols around it.

Upvotes: 2

Related Questions