Reputation: 321
I have a string containing exactly one pair of parentheses (and some words between them), and lots of other words.
How would one create a regex to split the string into [ words before (, words between (), words after )]?
e.g.
line = "a bbbb cccc dd ( ee fff ggg ) hhh iii jk"
would be split into
[ "a bbbb cccc dd", "ee fff ggg", "hhh iii jk" ]
I've tried
line = re.compile("[^()]+").split(line)
but it doesn't work.
Upvotes: 0
Views: 42
Reputation: 1979
It seems that in the process you want to remove the leading and trailing whitespaces, i.e., the whitespaces before and after (
and )
. You could try:
>>> line = "a bbbb cccc dd ( ee fff ggg ) hhh iii jk"
>>> re.split(r'\s*[\(\)]\s*', line)
['a bbbb cccc dd', 'ee fff ggg', 'hhh iii jk']
>>>
>>> # to make it look as in your description ...
>>> line = re.compile(r'\s*[\(\)]\s*').split(line)
>>> line
['a bbbb cccc dd', 'ee fff ggg', 'hhh iii jk']
Upvotes: 2
Reputation: 10962
To split the output in three I think the simplest option is to use three capture groups (some_regex)(another_regex)(yet_another_regex)
. In your case, the first part is any character that is not a (
, followed by (
, then any character that is not )
followed by )
and finally followed by any character.
Therefore the regex is ([^(]*)\(([^)]*)\)(.*)
, which you can then use to retrieve groups (your desired output):
>>> import re
>>> pattern = re.compile(r'([^(]*)\(([^)]*)\)(.*)')
>>> pattern.match(line).groups()
('a bbbb cccc dd ', ' ee fff ggg ', ' hhh iii jk')
With:
([^(]*)
the first group([^)]*)
the second group(.*)
the last groupUpvotes: 1