Reputation: 13
I'm trying to match some variable names in a html document to populate a dictionary. I have the html
<div class="no_float">
<b>{node_A_test00:02d}</b>{{css}}
<br />
Block mask: {block_mask_lower_node_A} to {block_mask_upper_node_A}
<br />
</div>
<div class="sw_sel_container">
Switch selections:
<table class="sw_sel">
<tr>
<td class="{sw_sel_node_A_03}">1</td>
<td class="{sw_sel_node_A_03}">2</td>
<td class="{sw_sel_node_A_03}">3</td>
<td class="{sw_sel_node_A_04}">4</td>
<td class="{sw_sel_node_A_05}">5</td>
I want to match code between { and ( } or : ). But if it starts with {{ I don't want to match it at all (I will be using this for inline css}
so far I have the regex expression
(?<=\{)((?!{).*?)(?=\}|:)
but this is still matching text inside {{css}}.
Upvotes: 0
Views: 505
Reputation: 16499
I see that you've already found a solution that works, but I thought it might be worthwhile to explain what the problem with your original regex is.
(?<=\{)
means that a {
must precede whatever matches next. Fair enough.((?!{).*?)
will match anything that starts with a character other than {
. Okay, so we're only matching things inside the braces. Good.But now consider what happens when you have two opening braces: {{bar}}
. Consider the substring bar
. What precedes the b
? A {
. Does bar
start with {
? Nope. So the regex will consider this a match.
You have, of course, prevented the regex from matching {bar}
, which is what it would do if you left the (?!{)
out of your pattern, because {bar}
starts with a {
. But as soon as the regex engine determines that no valid match starts on the {
character, it moves on to the next character--b
--and sees that a match starts there.
Now, just for kicks, here's the regex I'd use:
(?!<={){([^{}:]+)[}:](?!=})
(?!<{)
: the match shouldn't be preceded by {
.{
: the match starts with an open brace.([^{}:]+)
: group everything that isn't an open-brace, close-brace, or colon. This is the part of the match that we actually want.[}:]
: end the match with a close-brace or colon.(?!})
: the match shouldn't be followed by }
.Upvotes: 0
Reputation: 97968
This seems to be working:
(?<=(?<!{){)[^{}:]+
and this with a capture:
(?<!{){([^{}:]+)
Upvotes: 0
Reputation: 298256
You could do something like this:
re.findall(r'''
(?<!\{) # No opening bracket before
\{ # Opening bracket
([^}]+) # Stuff inside brackets
\} # Closing bracket
(?!\}) # No closing bracket after
''', '{foo} {{bar}} {foo}', flags=re.VERBOSE)
Upvotes: 1