Symon
Symon

Reputation: 1738

Split string with regex separator except when separator is escaped

I have a code (consider 'Z' as escape character, and ',' as separator):

import re

a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print re.split(r'(?<!Z)[,]+', a)

Result is:

['aaa', 'bbbZ,cccZZ,dddZZZ,eee']

But I need the result processed escaped sequences (in my example escape char is 'Z'):

['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']

When I try to use variable width pattern for negative lookbehind assertion:

print re.split(r'(?<!(ZZ)*Z)[,]+', a)

it says:

sre_constants.error: look-behind requires fixed-width pattern

Upvotes: 2

Views: 789

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You may match the sequences with a pattern that will either match any chars that are not a comma, or any 1+ commas preceded with odd number of Zs:

import re
a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print(re.findall(r'(?:(?<!Z)Z(?:ZZ)*,+|[^,])+', a))
# => ['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']

See the Python demo and a regex demo.

Pattern details:

  • (?:(?<!Z)Z(?:ZZ)*,+|[^,])+ - 1 or more occurrences of:
    • (?<!Z)Z - a Z not immediately preceded with Z
    • (?:ZZ)* - zero or more sequences of ZZ
    • ,+ - 1 or more commas
    • | - or
    • [^,] - any char that is not a comma

With a PyPi regex module, you may use regex.split method with a (?<=(?<!Z)(?:ZZ)*),+ regex:

import regex
a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print(regex.split(r'(?<=(?<!Z)(?:ZZ)*),+', a))
#  ['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']

See another online Python demo.

Here, the pattern matches 1 or more commas (,+) that are preceded with any 0+ sequences of ZZ that are not preceded with another Z (that is, with an even number of Z).

Upvotes: 6

Related Questions