Reputation: 145
I want to split some text by these delimiters: ","
, ";"
, " y "
(whitespace is necessary)
It should also ignore any delimiters within parentheses
Here's what I've tried for the first two:
re.split('[,;]+(?![^(]*\))', text_spam)
'foo, bar; baz spam y eggs guido'
should split into ['foo', ' bar', ' baz spam', 'eggs guido']
I can't figure out how to include a multicharacter string inside the set to get the last delimiter.
TIA
Upvotes: 1
Views: 132
Reputation: 627082
You may consider using a non-capturing group with an alternation operator |
to introduce a multi-character string as an alternative to a character set, and set the +
modifier to the group:
r'(?:[,;]| y )+(?![^(]*\))'
See the regex demo
You may further strip the items you get and omit any empty items using
import re
text = "foo, bar; baz spam y eggs guido (foo, bar; baz spam y eggs guido)"
results = re.split(r'(?:[,;]\s*| y )+(?![^(]*\))', text)
print( list(filter(None, [x.strip() for x in results])) )
# => ['foo', 'bar', 'baz spam', 'eggs guido (foo, bar; baz spam y eggs guido)']
See the Python demo
Upvotes: 4