4Looped
4Looped

Reputation: 57

How to use split strings with closed brackets as the separator

If I have a messy string like '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]' and I want to split it into a list so that each part within any bracket is an item like ['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach'] How would I do this? I can't figure out a way to make the .split() method work.

Upvotes: 0

Views: 231

Answers (3)

DarrylG
DarrylG

Reputation: 17166

You can use regex

import re

s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'

lst = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
print(lst)

Output

['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']

Explanation

Regex pattern to match

r'\[(.*?)\]|\((.*?)\)'

Subpattern 1: To match items in square brackets i.e. [...]

\[(.*?)\]  # Use \[ and \] since  [, ] are special characters
           #  we have to escape so they will be literal
 (.*?)     # Is a Lazy match of all characters 

Subpattern 2: To match in parentheses i.e. (..)

\((.*?)\)   # Use \( and \) since  (, ) are special characters
            # we have to escape so they will be literal

Since we are looking for either of the two patterns we use:

'|'         # which is or between the two subpatterns
            # to match Subpattern1 or Subpattern

The expression

re.findall(r'\[(.*?)\]|\((.*?)\)', s)

[('Carrots', ''), ('Broccoli', ''), ('', 'cucumber'), ('', 'tomato'), ('spinach', '')]

The result is in the first or second tuple. So we use:

[x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]

To extract the data from the first or second tuple and place it into a list.

Upvotes: 2

Badgy
Badgy

Reputation: 819

Assuming no other brackets or operators (e.g. '-') than the ones present in your example string are used, try

s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'

words = []
for elem in s.replace('-', ' ').split():
    if '[' in elem or '(' in elem:
        words.append(elem.strip('[]()'))

Or with list comprehension

words = [elem.strip('[]()') for elem in s.replace('-', ' ').split() if '[' in elem or '(' in elem]

Upvotes: 0

Lydia van Dyke
Lydia van Dyke

Reputation: 2516

Without any error handling whatsoever (like checking for nested or unbalanced brackets):

def parse(expr):
    opening = "(["
    closing = ")]"
    result = []
    current_item = ""
    for char in expr:
        if char in opening:
            current_item = ""
            continue
        if char in closing:
            result.append(current_item)
            continue
        current_item += char
    return result

print(parse("(a)(b) stuff (c) [d] more stuff - (xxx)."))

>>> ['a', 'b', 'c', 'd', 'xxx']

Depending on your needs, this might already be good enough...

Upvotes: 0

Related Questions