pepsi
pepsi

Reputation: 321

Regex for simple pattern in python

I have a string containing exactly one pair of parentheses (and some words between them), and lots of other words.

How would one create a regex to split the string into [ words before (, words between (), words after )]?

e.g.

line = "a   bbbb cccc     dd     ( ee fff ggg )    hhh iii jk"

would be split into

[ "a   bbbb cccc     dd", "ee fff ggg", "hhh iii jk" ]

I've tried

line = re.compile("[^()]+").split(line)

but it doesn't work.

Upvotes: 0

Views: 42

Answers (2)

Nikolaos Chatzis
Nikolaos Chatzis

Reputation: 1979

It seems that in the process you want to remove the leading and trailing whitespaces, i.e., the whitespaces before and after ( and ). You could try:

>>> line = "a   bbbb cccc     dd     ( ee fff ggg )    hhh iii jk"
>>> re.split(r'\s*[\(\)]\s*', line)
['a   bbbb cccc     dd', 'ee fff ggg', 'hhh iii jk']
>>>
>>> # to make it look as in your description ...
>>> line = re.compile(r'\s*[\(\)]\s*').split(line)
>>> line
['a   bbbb cccc     dd', 'ee fff ggg', 'hhh iii jk']

Upvotes: 2

cglacet
cglacet

Reputation: 10962

To split the output in three I think the simplest option is to use three capture groups (some_regex)(another_regex)(yet_another_regex). In your case, the first part is any character that is not a (, followed by (, then any character that is not ) followed by ) and finally followed by any character.

Therefore the regex is ([^(]*)\(([^)]*)\)(.*), which you can then use to retrieve groups (your desired output):

>>> import re
>>> pattern = re.compile(r'([^(]*)\(([^)]*)\)(.*)')
>>> pattern.match(line).groups()
('a   bbbb cccc     dd     ', ' ee fff ggg ', '    hhh iii jk')

With:

  • ([^(]*) the first group
  • ([^)]*) the second group
  • (.*) the last group

Upvotes: 1

Related Questions