Ultranium
Ultranium

Reputation: 372

Apply multiple conditions to a capturing group

I need to extract from a text all the words which match these two requirements:

  1. Contain at least one uppercase letter
  2. Don't fully consist of uppercase characters.

So, Word and WorD are correct captures, but word and WORD aren't.

So, I can capture all the words using a \b([a-zA-Z]+)\b Regex, but I don't know how to add the uppercase letters condition here.

As about the requirement #1, I tried to use a positive lookahead here like this:

\b(?=.*[A-Z]+)([a-zA-Z]+)\b , but now it captures all the words from a line if this line has at least one uppercase letter.

Is it even possible to apply additional conditions to a capturing group? I can process this in my application's code but I'd really prefer to fit all those requirements in a single Regex.

Upvotes: 1

Views: 451

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You may use

\b(?=[A-Z]*[a-z])(?=[a-z]*[A-Z])([a-zA-Z]+)\b

See the regex demo

Actually, you do not even need the capturing group, ([a-zA-Z]+) can be usually replaced with [a-zA-Z]+, but it depends where you are using the regex.

Details

  • \b - word boundary
  • (?=[A-Z]*[a-z]) - a positive lookahead that requires a lowercase letter after 0+ uppercase ones
  • (?=[a-z]*[A-Z]) - a positive lookahead that requires a uppercase letter after 0+ lowercase ones
  • ([a-zA-Z]+) - Group 1: 1 or more letters
  • \b - a word boundary.

Upvotes: 1

Related Questions