Eric
Eric

Reputation: 1

Python Regex help

I would like to use a regex to do the following in Python:

I am given a list of strings such as: 'abc01 - [def02] - ghi03 - jkl04'

Each string will have a different number of items. Some will have brackets around and some will not.

Can someone help me with a regex match that will consist solely of items not in brackets? Dashes and spaces would need to be removed. So for the example above the output would be: [abc01, ghi03, jkl04]

Thanks

Upvotes: 0

Views: 114

Answers (4)

Tim
Tim

Reputation: 346

From the above description you just need to use findall() to match any sequence of letters and numbers (using the short code \w to match letters and numbers below).

>>> import re
>>> re.findall(r'\w+', 'abc01 - [def02] - ghi03 - jkl04')
['abc01', 'def02', 'ghi03', 'jkl04']

Upvotes: 0

Staffan Nöteberg
Staffan Nöteberg

Reputation: 4145

The following regex will solve your problem:

\b(?<!\[)\w+

The Python code is then:

for match in re.finditer(r"\b(?<!\[)\w+", input_line):
    item = match.group()

Notes:

  • \b asserts that the item starts at a word break, not in the middle of an item
  • The negative lookbehind (?<!\[) asserts that the item wasn't preceded by a [
  • \w+ matches an item of at least one consecutive word character, as many as possible

Upvotes: 0

kurumi
kurumi

Reputation: 25609

>>> a='abc01 - [def02] - ghi03 - jkl04'
>>> [ i for  i in a.split(" - ") if "[" not in i ]
['abc01', 'ghi03', 'jkl04']

Upvotes: 2

bradley.ayers
bradley.ayers

Reputation: 38392

Is regex really the best tool for the job?

>>> S = 'abc01 - [def02] - ghi03 - jkl04'
>>> [x for x in S.split(' - ') if not (x.startswith('[') or x.endswith(']'))]
['abc01', 'ghi03', 'jkl04']

Upvotes: 9

Related Questions