Frode Akselsen
Frode Akselsen

Reputation: 676

Regex Capturing Group with alternative doesn't match

I have the following string where I want to match the valid <key>:<value> pairs.

A valid <key> is anything with a non-whitespace character followed by :
A valid <value> is either enclosed in [] or a string without whitespaces.

key1:value1 key#2:@value#2 nyet key3:[@value#3, value4] key4:[value5] :bar

Basically I want to match everything except nyet and :bar

I came up with following regex \S+:(\S+|\[[^]]+\]) but it doesn't seem to match the expression in key3:[@value#3, value4]. In the capturing group, the second alternative \[[^]]+\] should match this expression, so I don't understand why it doesn't match.

The following regex works: \S+:([^([ )]+|\[[^\]]+\]) but doesn't feel elegant.

Questions:

  1. Why does the first regex \S+:(\S+|\[[^]]+\]) not work?
  2. How would a more elegant solution look to match the key value pairs?

Upvotes: 0

Views: 1766

Answers (2)

The fourth bird
The fourth bird

Reputation: 163632

In the pattern you can switch the alternatives \S+:(\[[^]]+\]|\S+) but is would also match the [] in that case.

You could also exclude matching the : in the first part [^\s:]+:(\[[^]]+]|\S+) using a negated character class.

For the groups, you could use an alternation and check for the existence of group 2 or group 3 for the value.

([^\s:]+):(?:\[([^][]+)]|(\S+))

The pattern matches:

  • ([^\s:]+) Capture group 1, match any char except a whitespace char or :
  • : Match the :
  • (?: Non capture group
    • \[([^][]+)] Match [ capture in group 2 any char except [ and ] and match the closing ]
    • | or
    • (\S+) Capture 1+ non whitespace chars in group 3
  • ) Close non capture group

Regex demo


If an conditional is supported, you could check if group 2 has captured a [. If it did, you can capture any char except the brackets in group 3.

The values you want are then in group 1 and group 3.

([^\s:]+):(?:(\[)(?=[^][]*]))?((?(2)[^][]+|\S+))\]?

Regex demo

Upvotes: 1

Peter Thoeny
Peter Thoeny

Reputation: 7616

  1. You were close with your regex. It failed because the :\S had precedence over :\[
  2. This regex works:
/\S+:(?:\[[^\]]*\]|\S+)/g

Explanation:

  • \S+: - 1+ non-space chars and a colon
  • (?: - non-capturing group start (for OR)
    • \[[^\]]*\] - [...] pattern
    • | - logical OR
    • \S+ - 1+ non-space chars
  • ) - non-capturing group end

Upvotes: 1

Related Questions