Reputation: 1
I would like to use a regex to do the following in Python:
I am given a list of strings such as: 'abc01 - [def02] - ghi03 - jkl04'
Each string will have a different number of items. Some will have brackets around and some will not.
Can someone help me with a regex match that will consist solely of items not in brackets? Dashes and spaces would need to be removed. So for the example above the output would be: [abc01, ghi03, jkl04]
Thanks
Upvotes: 0
Views: 114
Reputation: 346
From the above description you just need to use findall() to match any sequence of letters and numbers (using the short code \w to match letters and numbers below).
>>> import re
>>> re.findall(r'\w+', 'abc01 - [def02] - ghi03 - jkl04')
['abc01', 'def02', 'ghi03', 'jkl04']
Upvotes: 0
Reputation: 4145
The following regex will solve your problem:
\b(?<!\[)\w+
The Python code is then:
for match in re.finditer(r"\b(?<!\[)\w+", input_line):
item = match.group()
Notes:
\b
asserts that the item starts at a word break, not in the middle of an item(?<!\[)
asserts that the item wasn't preceded by a [
\w+
matches an item of at least one consecutive word character, as many as possibleUpvotes: 0
Reputation: 25609
>>> a='abc01 - [def02] - ghi03 - jkl04'
>>> [ i for i in a.split(" - ") if "[" not in i ]
['abc01', 'ghi03', 'jkl04']
Upvotes: 2
Reputation: 38392
Is regex really the best tool for the job?
>>> S = 'abc01 - [def02] - ghi03 - jkl04'
>>> [x for x in S.split(' - ') if not (x.startswith('[') or x.endswith(']'))]
['abc01', 'ghi03', 'jkl04']
Upvotes: 9