Reputation: 1738
I have a code (consider 'Z' as escape character, and ',' as separator):
import re
a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print re.split(r'(?<!Z)[,]+', a)
Result is:
['aaa', 'bbbZ,cccZZ,dddZZZ,eee']
But I need the result processed escaped sequences (in my example escape char is 'Z'):
['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']
When I try to use variable width pattern for negative lookbehind assertion:
print re.split(r'(?<!(ZZ)*Z)[,]+', a)
it says:
sre_constants.error: look-behind requires fixed-width pattern
Upvotes: 2
Views: 789
Reputation: 626929
You may match the sequences with a pattern that will either match any chars that are not a comma, or any 1+ commas preceded with odd number of Z
s:
import re
a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print(re.findall(r'(?:(?<!Z)Z(?:ZZ)*,+|[^,])+', a))
# => ['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']
See the Python demo and a regex demo.
Pattern details:
(?:(?<!Z)Z(?:ZZ)*,+|[^,])+
- 1 or more occurrences of:
(?<!Z)Z
- a Z
not immediately preceded with Z
(?:ZZ)*
- zero or more sequences of ZZ
,+
- 1 or more commas|
- or[^,]
- any char that is not a commaWith a PyPi regex module, you may use regex.split
method with a (?<=(?<!Z)(?:ZZ)*),+
regex:
import regex
a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print(regex.split(r'(?<=(?<!Z)(?:ZZ)*),+', a))
# ['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']
See another online Python demo.
Here, the pattern matches 1 or more commas (,+
) that are preceded with any 0+ sequences of ZZ
that are not preceded with another Z
(that is, with an even number of Z
).
Upvotes: 6