user2341223
user2341223

Reputation: 13

Python - Match string between { } characters, but not between {{ }}

I'm trying to match some variable names in a html document to populate a dictionary. I have the html

<div class="no_float">
    <b>{node_A_test00:02d}</b>{{css}}
    <br />
    Block mask: {block_mask_lower_node_A} to {block_mask_upper_node_A}
    <br />
</div>
<div class="sw_sel_container">
    Switch selections: 
    <table class="sw_sel">
        <tr>
            <td class="{sw_sel_node_A_03}">1</td>
            <td class="{sw_sel_node_A_03}">2</td>
            <td class="{sw_sel_node_A_03}">3</td>
            <td class="{sw_sel_node_A_04}">4</td>
            <td class="{sw_sel_node_A_05}">5</td>

I want to match code between { and ( } or : ). But if it starts with {{ I don't want to match it at all (I will be using this for inline css}

so far I have the regex expression

(?<=\{)((?!{).*?)(?=\}|:)

but this is still matching text inside {{css}}.

Upvotes: 0

Views: 505

Answers (3)

Kyle Strand
Kyle Strand

Reputation: 16499

I see that you've already found a solution that works, but I thought it might be worthwhile to explain what the problem with your original regex is.

  • (?<=\{) means that a { must precede whatever matches next. Fair enough.
  • ((?!{).*?) will match anything that starts with a character other than {. Okay, so we're only matching things inside the braces. Good.

But now consider what happens when you have two opening braces: {{bar}}. Consider the substring bar. What precedes the b? A {. Does bar start with {? Nope. So the regex will consider this a match.

You have, of course, prevented the regex from matching {bar}, which is what it would do if you left the (?!{) out of your pattern, because {bar} starts with a {. But as soon as the regex engine determines that no valid match starts on the { character, it moves on to the next character--b--and sees that a match starts there.

Now, just for kicks, here's the regex I'd use:

(?!<={){([^{}:]+)[}:](?!=})

  • (?!<{) : the match shouldn't be preceded by {.
  • { : the match starts with an open brace.
  • ([^{}:]+) : group everything that isn't an open-brace, close-brace, or colon. This is the part of the match that we actually want.
  • [}:] : end the match with a close-brace or colon.
  • (?!}) : the match shouldn't be followed by }.

Upvotes: 0

perreal
perreal

Reputation: 97968

This seems to be working:

(?<=(?<!{){)[^{}:]+

and this with a capture:

(?<!{){([^{}:]+)

Upvotes: 0

Blender
Blender

Reputation: 298256

You could do something like this:

re.findall(r'''
    (?<!\{)    # No opening bracket before
    \{         # Opening bracket
      ([^}]+)  # Stuff inside brackets
    \}         # Closing bracket
    (?!\})     # No closing bracket after
''', '{foo} {{bar}} {foo}', flags=re.VERBOSE)

Upvotes: 1

Related Questions