lo7
lo7

Reputation: 445

Get words in parenthesis as a group regex

String1: {{word1|word2|word3 (word4 word5)|word6}}

String2: {{word1|word2|word3|word6}}

With this regex sentence:

(?<=\{\{)(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)(?=\}\})

I capture String2 as groups. How can I change the regex sentence to capture (word4 word5) also as a group?

Upvotes: 2

Views: 842

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110685

You could simplify the expression by matching the desired substrings rather than capturing them. For that you could use the following regular expression.

(?<=[{| ])\w+(?=[}| ])|\([\w ]+\)

Regex demo <¯\(ツ)> Python demo

The elements of the expression are as follows.

(?<=     # begin a positive lookbehind
  [{| ]  # match one of the indicated characters
)        # end the positive lookbehind
\w+      # match one or more word characters
(?=      # begin a positive lookahead
  [}| ]  # match one of the indicated characters
)        # end positive lookahead
|        # or
\(       # match character
[\w ]+   # match one or more of the indicated characters 
\)       # match character

Note that this does not validate the format of the string.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You can add a (?:\s*(\([^()]*\)))? subpattern:

(?<=\{\{)(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)(?:\s*(\([^()]*\)))?\|(\w+(?:\s+\w+)*)(?=\}\})

See the regex demo.

The (?:\s*(\([^()]*\)))? part is an optional non-capturing group that matches one or zero occurrences of

  • \s* - zero or more whitespaces
  • ( - start of a capturing group:
    • \( - a ( char
    • [^()]* - zero or more chars other than ( and )
    • \) - a ) char
  • ) - end of the group.

If you need to make sure only whitespace separated words are allowed inside parentheses, replace [^()]* with \w+(?:\s+\w+)* and insert (?:\s*(\(\w+(?:\s+\w+)*\)))?:

(?<=\{\{)(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)\|(\w+(?:\s+\w+)*)(?:\s*(\(\w+(?:\s+\w+)*\)))?\|(\w+(?:\s+\w+)*)(?=\}\})

See this regex demo.

Upvotes: 1

Related Questions