road_rash
road_rash

Reputation: 145

Python regex multicharacter match inside set

I want to split some text by these delimiters: ",", ";", " y " (whitespace is necessary)

It should also ignore any delimiters within parentheses

Here's what I've tried for the first two:

re.split('[,;]+(?![^(]*\))', text_spam)

'foo, bar; baz spam y eggs guido' should split into ['foo', ' bar', ' baz spam', 'eggs guido']

I can't figure out how to include a multicharacter string inside the set to get the last delimiter.

TIA

Upvotes: 1

Views: 132

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You may consider using a non-capturing group with an alternation operator | to introduce a multi-character string as an alternative to a character set, and set the + modifier to the group:

r'(?:[,;]| y )+(?![^(]*\))'

See the regex demo

You may further strip the items you get and omit any empty items using

import re
text = "foo, bar; baz spam y eggs guido (foo, bar; baz spam y eggs guido)"
results = re.split(r'(?:[,;]\s*| y )+(?![^(]*\))', text)
print( list(filter(None, [x.strip() for x in results])) )
# => ['foo', 'bar', 'baz spam', 'eggs guido (foo, bar; baz spam y eggs guido)']

See the Python demo

Upvotes: 4

Related Questions