Reputation: 1061

How do I match everything except two characters?

I need to match all characters between double curly braces, but I need to be able to find multiple matches in one large string.

I have been using this RegEx tester because I'm doing this in C#: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx Also, I have "SingleLine" checked because I want . to match \n

Here is an example of the string I'm matching:

<div class="nest-1-2">
    <dl>
    <dt>Type:</dt>
    <dd>{{(Entity)Field Name.separator(, ) > [:Name:]}}</dd>
    <dt>At:</dt>
    <dd>{{(Entity)Field Name > [:Name:]}}</dd>
    <dt>Team:</dt>
    <dd>{{(Entity)Field Name.separator(, ) > [:First Name:] [:Last Name:]}}</dd>
    </dl>
</div>

Here's the Regex that I'm using:

\{\{(?<field>[^>]*)?[ > ]?(?<looptemplate>[^\}\}].*)?\}\}

The problem I'm having is that I want everything in to match all text up to the next }} and this is matching the last one instead of the next one. So I'm getting 1 match which is everything from the first {{ to the last }} I tried using negative look ahead (?!\}\}) but that doesn't seem to work for me. Unfortunately, the [^\}\}] doesn't match both curly braces, it only matches one.

I'm not a total noob with regular expressions, but this one has really gotten me. I've looked all around trying to find an answer, so now I'm hoping someone can help me.

I'd really appreciate any help from the experts.

Upvotes: 1

Answers (4)

rnirnber

Reputation: 615

Start of Edit:

Okay so I changed the text file....

<div class="nest-1-2">
    <dl>
    <dt>Type:</dt>
    <dd>{{(Entity)Field Name.separator(, ) > [:Name:]
    foo came up
    boo is here too}}</dd>
    <dt>At:</dt>
    <dd>{{(Entity)Field Name > [:Name:]}}</dd>
    <dt>Team:</dt>
    <dd>{{(Entity)Field Name.separator(, ) > [:First Name:] [:Last Name:]}}</dd>
    </dl>
</div>

And then I added a parameter in the Regex new constructor... Ironically the option is "SingleLine"

System.Text.RegularExpressions.Regex Y = new System.Text.RegularExpressions.Regex("{{(.*?)\\}}", System.Text.RegularExpressions.RegexOptions.Singleline);

End of Edit .... ... ...

I copy and pasted your example string into a flat text file for testing....

namespace a
{
    class Program
    {
        static void Main(string[] args)
        {
            string X = System.IO.File.ReadAllText("C:\\Users\\rnirnberger\\Documents\\a.txt");
            System.Text.RegularExpressions.Regex Y = new System.Text.RegularExpressions.Regex("{{(.*?)\\}}");
            System.Text.RegularExpressions.MatchCollection Z = Y.Matches(X);
            foreach (System.Text.RegularExpressions.Match match in Z)
            {
                Console.WriteLine(match.Value);

                //If you want to strip out the double-braces
                //↓↓↓↓

                //Console.WriteLine(match.Value.Replace("{{", "").Replace("}}", ""));
            }
        }
    }

Upvotes: 0

poke

Reputation: 387993

Also, I have "SingleLine" checked because I want . to match \n

If you untick “Single line” it will work. So obviously your . is the problem. An easy solution would be to use .*? instead of .* as that will non-greedily select just as much as it needs (instead of greedily selecting as much as possible). Another solution would be to replace the . by something more specific, like a negative look-ahead as you probably do no want to match another {{ inside of it (or even }}). But in this case the non-greedy solution is much easier.

You should probably change the multiplicator of the field character class as well so it won’t match things that are already part of the looptemplate.

Also note that [ > ] is a character class that will select either a space or >. So it will not select " > ". If you want that, just leave the brackets off:

\{\{(?<field>[^>]*?)? > (?<looptemplate>[^}].*?)?\}\}

In your case, as you probably want to make the looptemplate thing optional, you probably want to do it like this though, with a non-capturing group:

\{\{(?<field>[^>]*?)?(?: > (?<looptemplate>[^}].*?))?\}\}

Also one final note; if you want the . to match line breaks, better provide an example where that is necessary.

(Okay, another note, as m.buettner correctly mentioned in his answer, character classes only need to mention each character once; furthermore, you do not need to escape curly braces inside character classes, so it all simplifies to just [^}])

Upvotes: 2

Martin Ender

Reputation: 44279

A few things:

You were using ? on your capturing groups which were containing *. The * means "0 or more times", so basically the contents are already optional. Using ? doesn't do anything.
```
\{\{(?<field>[^>]*)[ > ]?(?<looptemplate>[^\}\}].*)\}\}
```
[ > ] matches 1 character. Either a space or >. You probably meant (?: > ) (which matches " > " (ignore the quotes, otherwise SO wouldn't render the spaces) and groups it together.
```
\{\{(?<field>[^>]*)(?: > )?(?<looptemplate>[^\}\}].*)\}\}
```
[^\}\}] is the same as [^\}]. Negated character classes don't work with strings, they only work on every individual character inside, so writing one multiple times doesn't change anything. I guess that's why you tried the negative lookahead. This is right, but you need to check that condition for every single character of the repetition. Otherwise you only check once, that your looptemplate doesn't begin with \}\} but then you fire away with .*. So group . and the lookahead together:
```
\{\{(?<field>[^>]*)(?: > )?(?<looptemplate>(?:(?!\}\}).)*)\}\}
```
Your (?: > ) is optional, so if you have some {{...}} that doesn't contain it (only has the field part you will get the same problem as before, just this time with [^>]. Include the lookahead here, too:
```
\{\{(?<field>(?:(?!\}})[^>])*)(?: > )?(?<looptemplate>(?:(?!\}\}).)*)\}\}
```

By the way, an alternative to using negated character classes or lookaheads is to use ungreedy repetition. If you can use negated character classes, that is usually preferable, because it's equally readable but usually more efficient than the ungreedy modifier, since it does not require backtracking. In your case you have to use the lookahead (because there is a pattern of two consecutive characters you don't want to go past, instead of just one character). In that case, the lookahead might cancel out the performance gains from avoiding backtracking, plus the lookahead is usually a bit less readable. So you might just want to go with an ungreedy repetition here (append the repetition quantifier with ?):

\{\{(?<field>(?:(?!\}})[^>])*)(?: > )?(?<looptemplate>.*?)\}\}

Note that you cannot use an ungreedy repetition for field because, (?: > ) is optional. That would lead to field being empty and everything else (including a possible " > " being matched by looptemplate. Unless you include the > into an optional group together with looptemplate:

\{\{(?<field>[^>]*?)(?: > (?<looptemplate>.*?))?\}\}

One final note. This is only a matter of taste, but let me introduce you to a different form of escaping. Many meta-characters are no meta-characters when inside a character class (only ], -, ^ and \ still are). So you can wrap your meta-character in a character class to escape it:

[{][{](?<field>[^>]*?)(?: > (?<looptemplate>.*?))?[}][}]

As I said, just a suggestion, but for most characters, I find this more readable than using a backslash.

Upvotes: 5

femtoRgon

Reputation: 33351

What about this:

\{\{.*?\}\}

.*? is similar to .* but employs lazy matching, instead of greedy. That means that it stops matching, and attempts to continue to match the rest of the regex as soon as possible, rather than greedy matching which attempts to consume as much as possible before moving on to the rest of the regex.

So, applied to: "{{this}} and that}}"

\{\{.*?\}\} matches "{{this}}"

and

\{\{.*\}\} matches "{{this}} and that}}"

Upvotes: 0

How do I match everything except two characters?

Answers (4)

Related Questions