Zauberin Stardreamer
Zauberin Stardreamer

Reputation: 1344

Split string into list if separator is not enclosed

I'm working on a simple wiki engine, and I am wondering if there is an efficient way to split a string into a list based on a separator, but only if that separator is not enclosed with double square brackets or double curly brackets.

So, a string like this:

"|Row 1|[[link|text]]|{{img|altText}}|"

Would get converted to a list like this:

['Row 1', '[[link|text]]', '{{img|altText}}']

EDIT: Removed the spaces from the example string, since they were causing confusion.

Upvotes: 1

Views: 231

Answers (3)

georg
georg

Reputation: 214949

Tim's expression is elaborate, but you can usually greatly simplify "split" expressions by converting them to "match" ones:

import re
s = "|Row 1|[[link|text|df[sdfl|kj]|foo]]|{{img|altText|{|}|bar}}|"

print re.findall(r'\[\[.+?\]\]|{{.+?}}|[^|]+', s)

# ['Row 1', '[[link|text|df[sdfl|kj]|foo]]', '{{img|altText|{|}|bar}}']

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

You can use

def split_special(subject):
    return re.split(r"""
        \|           # Match |
        (?!          # only if it's not possible to match...
         (?:         # the following non-capturing group:
          (?!\[\[)   # that doesn't contain two square brackets
          .          # but may otherwise contain any character
         )*          # any number of times,
         \]\]        # followed by ]]
        )            # End of first loohahead. Now the same thing for braces:
        (?!(?:(?!\{\{).)*\}\})""", 
        subject, flags=re.VERBOSE)

Result:

>>> s = "|Row 1|[[link|text|df[sdfl|kj]|foo]]|{{img|altText|{|}|bar}}|"
>>> split_special(s)
['', 'Row 1', '[[link|text|df[sdfl|kj]|foo]]', '{{img|altText|{|}|bar}}', '']

Note the leading and trailing empty strings - they need to be there because they do exist before your first and after your last | in the test string.

Upvotes: 3

Tommy
Tommy

Reputation: 13642

Is it possible to have Row 1|[? If the separator is always surrounded by spaces like your above example, you can do

split(" | ")

Upvotes: -2

Related Questions