John Hennesey
John Hennesey

Reputation: 375

Finding two open brackets missing two closing brackets

I would like to have a Regex that will match any words that begin with two open brackets but do not have two matching closing brackets. For example:

Good afternoon Mr. [[Insured.InsuredName]] - Your policy 
[[Insured.CurrentPolicy is out of date.

In this case "Insured.CurrentPolicy" would be caught. I'm new at lookahead/lookbehinds. I appreciate your help.

Upvotes: 0

Views: 168

Answers (3)

Jim Driscoll
Jim Driscoll

Reputation: 904

In regular expressions, "not" is generally your enemy, so for this case I'd suggest just going for:

\[\[[a-zA-Z.]+\]?([^a-zA-Z.\]]|$)

It'll miss some cases like "[[Foo.Bar]Baz" but it's fairly readable and will catch a lot of cases.

Upvotes: 1

Steve Kline
Steve Kline

Reputation: 805

You could also try this, I got some errors on Wiktor's syntax. Could be specific to a certain version of regex. This one seems to be flexible to most regex versions.

(\[\[\s*[a-zA-Z]+\.[a-zA-Z]+\b)(?!]])

Regex Example

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You may try using

\[\[(?>(\p{Lu}\p{L}*(?:\.\p{Lu}\p{L}*)*))(?!]])

See the regex demo

Explanation:

  • \[\[ - two [ symbols
  • (?> - start of an atomic group that will prevent backtracking into its subpatterns so that if the lookahead after it fails the match, the whole regex could return no match
  • (\p{Lu}\p{L}*(?:\.\p{Lu}\p{L}*)*) - Group 1 capturing
    • \p{Lu}\p{L}* - an uppercase letter followed with 0+ any letters (NOTE: replace \p{L}* with \w* to match alphanumeric and underscore characters)
    • (?:\.\p{Lu}\p{L}*)* - zero or more sequence of a dot followed with an uppercase letter followed with 0+ any letters (same note as above applies).
  • ) - end of the atomic group.
  • (?!]]) - a negative lookahead that will fail the match if there are two consecutive ]] right after the matched text.

In case you just need to match any non-whitespace and non-] characters after [[, you may use 4castle's approach and use

\[\[(?>([^]\s]+))(?!]])

See this regex demo

Its explanation is pretty similar, just [^]\s]+ matches 1 or more characters other than ] and whitespace.

C# code:

var results = Regex.Matches(input, @"\[\[(?>(\p{Lu}\p{L}*(?:\.\p{Lu}\p{L}*)*))(?!]])")
       .Cast<Match>()
       .Select(m => m.Groups[1].Value)
       .ToList();

Upvotes: 3

Related Questions