Mike Christie
Mike Christie

Reputation: 391

Find a closing brace pair, skipping internal open/close pairs, in Python

I have a string that starts "{{ABC..." and which will also contain a closing brace pair. The problem is that it may also contain nested open/close brace pairs, and may contain further brace pairs after the matching closing pair. E.g. this is possible:

    {{ABC foo bar {{baz}} {{fred}} foo2}} other text {{other brace pair}}

In this case I would want the string up to "foo2}}. I can see how to do this by writing my own recursive function call, but is there a way to match this in a single pass?

Upvotes: 1

Views: 80

Answers (2)

bobble bubble
bobble bubble

Reputation: 18515

PyPI Regex supports recursion. To target {{ABC use a lookahead followed by a group that contains the recursed pattern. At (?1) the pattern contained in the first group gets pasted (read more).

(?={{ABC)({(?>[^}{]+|(?1))*})

See this demo at regex101 or a Python demo at tio.run


The (?> atomic group ) prevents running into backtracking issues on unbalanced braces.

Upvotes: 2

ILS
ILS

Reputation: 1380

You can find all enclosed substrings by scanning the input string only once.

The only thing you need is to record the number of left braces you have met. Increase it when you see left brace and decrease it when seeing right brace. When it decreases to 0 you get an enclosed substring.

def find_enclosed_string(string):
    left_brace_cnt = 0
    enclosed_list = []
    enclosed_str_range = [0, 0]
    for i, s in enumerate(string):
        if s == "{":
            if left_brace_cnt == 0:
                enclosed_str_range[0] = i
            left_brace_cnt += 1
        elif s == "}":
            left_brace_cnt -= 1
            if left_brace_cnt == 0:
                enclosed_str_range[1] = i
        if enclosed_str_range[1] > enclosed_str_range[0]:
            enclosed_list.append(string[enclosed_str_range[0]:enclosed_str_range[1]+1])
            enclosed_str_range = [0, 0]
    return enclosed_list

string = "{{ABC foo bar {{baz}} {{fred}} foo2}} other text {{other brace pair}}"

find_enclosed_string(string)

# ['{{ABC foo bar {{baz}} {{fred}} foo2}}', '{{other brace pair}}']

Upvotes: 2

Related Questions