Reputation: 57
If I have a messy string like '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'
and I want to split it into a list so that each part within any bracket is an item like ['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']
How would I do this? I can't figure out a way to make the .split()
method work.
Upvotes: 0
Views: 231
Reputation: 17166
You can use regex
import re
s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'
lst = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
print(lst)
Output
['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']
Explanation
Regex pattern to match
r'\[(.*?)\]|\((.*?)\)'
Subpattern 1: To match items in square brackets i.e. [...]
\[(.*?)\] # Use \[ and \] since [, ] are special characters
# we have to escape so they will be literal
(.*?) # Is a Lazy match of all characters
Subpattern 2: To match in parentheses i.e. (..)
\((.*?)\) # Use \( and \) since (, ) are special characters
# we have to escape so they will be literal
Since we are looking for either of the two patterns we use:
'|' # which is or between the two subpatterns
# to match Subpattern1 or Subpattern
The expression
re.findall(r'\[(.*?)\]|\((.*?)\)', s)
[('Carrots', ''), ('Broccoli', ''), ('', 'cucumber'), ('', 'tomato'), ('spinach', '')]
The result is in the first or second tuple. So we use:
[x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
To extract the data from the first or second tuple and place it into a list.
Upvotes: 2
Reputation: 819
Assuming no other brackets or operators (e.g. '-') than the ones present in your example string are used, try
s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'
words = []
for elem in s.replace('-', ' ').split():
if '[' in elem or '(' in elem:
words.append(elem.strip('[]()'))
Or with list comprehension
words = [elem.strip('[]()') for elem in s.replace('-', ' ').split() if '[' in elem or '(' in elem]
Upvotes: 0
Reputation: 2516
Without any error handling whatsoever (like checking for nested or unbalanced brackets):
def parse(expr):
opening = "(["
closing = ")]"
result = []
current_item = ""
for char in expr:
if char in opening:
current_item = ""
continue
if char in closing:
result.append(current_item)
continue
current_item += char
return result
print(parse("(a)(b) stuff (c) [d] more stuff - (xxx)."))
>>> ['a', 'b', 'c', 'd', 'xxx']
Depending on your needs, this might already be good enough...
Upvotes: 0