Find string between reoccurring substrings?

Question

I have a string similar to

s = "(test1 or (test2 or test3)) and (test4 and (test6)) and (test7 or test8) and test9"

I'm trying to extract between (),

['test1 or (test2 or test3)', 'test4 and (test6)', 'test7 or test8']

I have tried

result = re.search('%s(.*)%s' % ("(", ")"), s).group(1)
result =(s[s.find("(")+1 : s.find(")")])
result = re.search('((.*))', s)

Jean-Fran&#231;ois Fabre · Accepted Answer

you have nested parentheses. That calls for parsing, or if you don't want to go that far, back to basics, parse character by character to find the 0-nesting level of each group.

Then hack to remove the and tokens before if any.

The code I've written for this. Not short, not very complex either, self-contained, no extra libs:

s = "(test1 or (test2 or test3)) and (test4 and (test6)) and (test7 or test8) and test9"

nesting_level = 0
previous_group_index = 0

def rework_group(group):
    # not the brightest function but works. Maybe needs tuning
    # that's not the core of the algorithm but simple string operations
    # look for the first opening parenthese, remove what's before
    idx = group.find("(")
    if idx!=-1:
        group = group[idx:]
    else:
        # no parentheses: split according to blanks, keep last item
        group = group.split()[-1]
    return group

result = []

for i,c in enumerate(s):
    if c=='(':
        nesting_level += 1
    elif c==')':
        nesting_level -= 1
        if nesting_level == 0:
            result.append(rework_group(s[previous_group_index:i+1]))
            previous_group_index = i+1

result.append(rework_group(s[previous_group_index:]))

result:

>>> result
['(test1 or (test2 or test3))',
 '(test4 and (test6))',
 '(test7 or test8)',
 'test9']
>>>

Find string between reoccurring substrings?

Answers (2)

Related Questions