Reputation: 1344
I'm working on a simple wiki engine, and I am wondering if there is an efficient way to split a string into a list based on a separator, but only if that separator is not enclosed with double square brackets or double curly brackets.
So, a string like this:
"|Row 1|[[link|text]]|{{img|altText}}|"
Would get converted to a list like this:
['Row 1', '[[link|text]]', '{{img|altText}}']
EDIT: Removed the spaces from the example string, since they were causing confusion.
Upvotes: 1
Views: 231
Reputation: 214949
Tim's expression is elaborate, but you can usually greatly simplify "split" expressions by converting them to "match" ones:
import re
s = "|Row 1|[[link|text|df[sdfl|kj]|foo]]|{{img|altText|{|}|bar}}|"
print re.findall(r'\[\[.+?\]\]|{{.+?}}|[^|]+', s)
# ['Row 1', '[[link|text|df[sdfl|kj]|foo]]', '{{img|altText|{|}|bar}}']
Upvotes: 1
Reputation: 336128
You can use
def split_special(subject):
return re.split(r"""
\| # Match |
(?! # only if it's not possible to match...
(?: # the following non-capturing group:
(?!\[\[) # that doesn't contain two square brackets
. # but may otherwise contain any character
)* # any number of times,
\]\] # followed by ]]
) # End of first loohahead. Now the same thing for braces:
(?!(?:(?!\{\{).)*\}\})""",
subject, flags=re.VERBOSE)
Result:
>>> s = "|Row 1|[[link|text|df[sdfl|kj]|foo]]|{{img|altText|{|}|bar}}|"
>>> split_special(s)
['', 'Row 1', '[[link|text|df[sdfl|kj]|foo]]', '{{img|altText|{|}|bar}}', '']
Note the leading and trailing empty strings - they need to be there because they do exist before your first and after your last |
in the test string.
Upvotes: 3
Reputation: 13642
Is it possible to have Row 1|[? If the separator is always surrounded by spaces like your above example, you can do
split(" | ")
Upvotes: -2